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Abstract 

In  this  paper  we  analyze  Generalized  Method  of  Moments  (GMM)  estimators  for  time  series 
models  as  advocated  by  Hansen  and  Singleton.  It  is  well  known  that  these  estimators  achieve 
efficiency  bounds  if  the  number  of  lagged  observations  in  the  instrument  set  goes  to  infinity.  However, 
to  this  date  no  data  dependent  way  of  selecting  the  number  of  instruments  in  a  finite  sample  is 
available.  This  paper  derives  an  asymptotic  mean  squared  error  (MSE)  approximation  for  the 
GMM  estimator.  The  optimal  number  of  instruments  is  selected  by  minimizing  a  criterion  based  on 
the  MSE  approximation.  It  is  shown  that  the  fully  fesisible  version  of  the  GMM  estimator  is  higher 
order  adaptive.  In  addition  a  new  version  of  the  GMM  estimator  based  on  kernel  weighted  moment 
conditions  is  proposed.  The  kernel  weights  are  selected  in  a  data-dependent  way.  Expressions  for 
the  asymptotic  bias  of  kernel  weighted  and  standard  GMM  estimators  are  obtained.  It  is  shown  that 
standard  GMM  procedures  have  a  larger  asymptotic  bias  and  MSE  than  optimal  kernel  weighted 
GMM.  A  bias  correction  for  both  standard  and  kernel  weighted  GMM  estimators  is  proposed.  It  is 
shown  that  the  bias  corrected  version  achieves  a  faster  rate  of  convergence  of  the  higher  order  terms 
of  the  MSE  than  the  uncorrected  estimator. 

Key  Words:  time  series,  feasible  GMM,  number  of  instruments,  rate-adaptive  kernels,  higher 
order  adaptive,  bias  correction 
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1.  Introduction 

In  recent  years  GMM  estimators  have  become  one  of  the  main  tools  in  estimating  economic  mod- 
els based  on  first  order  conditions  for  optimal  behavior  of  economic  agents.  Hansen  (1982)  established 
the  asymptotic  properties  of  a  large  class  of  GMM  estimators.  Based  on  first  order  asymptotic  theory  it 
was  subsequently  shown  by  Chamberlain  (1987),  Hansen  (1985)  and  Newey  (1988)  that  GMM  estima- 
tors based  on  conditional  moment  restrictions  can  be  constructed  to  achieve  semiparametric  efficiency 
bounds. 

The  focus  of  this  paper  is  the  higher  order  asymptotic  analysis  of  GMM  estimators  for  the  time 
series  case.  In  the  cross-sectional  literature  it  is  well  know  that  using  a  large  number  of  instruments  can 
result  in  substantial  second  order  bias  of  GMM  estimators,  thus  putting  limits  to  the  implementation 
of  efficient  procedures.  Similar  results  are  obtained  in  this  paper  for  the  time  series  case.  In  addition, 
fully  feasible,  second  order  optimal  implementations  of  efficient  GMM  estimators  for  time  series  models 
are  developed. 

In  independent  sampling  situations  feasible  versions  of  efficient  GA'IM  estimators  were  implemented 
amongst  others,  by  Newey  (1990).  In  a  time  series  context  examples  of  first  order  efficient  estimators  are 
Hayashi  and  Sims  (1983),  Stoica,  Soderstrom  and  Friedlander  (1985),  Hansen  and  Singleton  (1991,1996) 
and  Hansen,  Heaton  and  Ogaki  (1996).  Under  special  circumstances  and  in  a  slightly  different  context, 
Kuersteiner  (2002)  constructs  a  feasible,  efficient  GMM  estimator  for  autoregressive  models  where  the 
number  of  instruments  is  allowed  to  increase  at  the  same  rate  as  the  sample  size.  More  generally 
however,  such  expansion  rates  do  not  lead  to  consistent  estimates.  In  fact,  to  this  date  no  analysis  of 
the  optimal  expansion  rate  for  the  number  of  instruments  for  efficient  GMM  procedures  depending  on 
an  infinite  dimensional  instrument  set  has  been  provided  in  the  context  of  time  series  models.  In  this 
paper  a  data  dependent  selection  rule  for  the  number  of  instruments  is  obtained  and  a  fully  feasible 
version  of  GMM  estimators  for  linear  time  series  models  is  proposed. 

Several  moment  selection  procedures,  applicable  to  time  series  data,  were  proposed  in  the  litera- 
ture. Andrews  (1999)  considers  selection  of  valid  instruments  out  of  a  finite  dimensional  instrument 
set  containing  potentially  invalid  instruments.  Hall  and  Inoue  (2001)  propose  an  information  criterion 
based  on  the  asymptotic  variance-covariance  matrix  of  the  GMM  estimator  to  select  relevant  moment 
conditions  from  a  finite  set  of  potential  moments.  Both  approaches  do  not  directly  apply  to  the  case 
of  infinite  dimensional  instrument  sets  considered  here.  Linton  (1995)  analyses  tiie  optimal  choice  of 
bandwidth  parameters  for  kernel  estimates  of  the  partially  linear  regression  model  based  on  minimizing 
the  asymptotic  MSB  of  the  estimator.  Xiao  and  Phillips  (1996)  apply  similar  ideas  to  determine  the 
optimal  bandwidth  in  the  estimation  of  the  residual  spectral  density  in  a  Whittle  likelihood  based  re- 


gression  set  up.  More  recently  Linton  (1997)  extended  his  procedure  to  the  determination  of  the  optimal 
bandwidth  for  an  efficient  semiparametric  instrumental  variables  estimator.  Donald  and  Newey  (2001) 
use  similar  arguments  to  determine  the  optimal  number  of  base  functions  in  polynomial  approximations 
to  the  optimal  instrument.  They  analyze  higher  order  asymptotic  expansions  of  the  estimators  around 
their  true  parameter  values.  While  the  first  order  asymptotic  terms  typically  do  not  depend  on  the 
estimation  of  infinite  dimensional  nuisance  parameters  as  shown  in  Andrews  (1994)  and  Newey  (1994) 
this  is  not  the  case  for  higher  order  terms  of  the  expansions. 

In  this  paper  we  will  obtain  expansions  similar  to  the  ones  of  Donald  and  Newey  (2001)  for  the 
case  of  GMM  estimators  for  models  with  lagged  dependent  right  hand  side  variables.  This  set  up  is 
important  for  the  analysis  of  intertemporal  optimization  models  which  are  characterized  by  first  order 
conditions  of  maximization.  One  particular  area  of  application  is  asset  pricing  models. 

Minimizing  the  asymptotic  approximation  to  the  MSE  with  respect  to  the  number  of  lagged  in- 
struments leads  to  a  feasible  GMM  estimator  for  time  series  models.  The  trade  of  is  between  more 
asymptotic  efficiency  as  measured  by  the  asymptotic  covariance  matrix  and  bias. 

Full  implementation  of  the  procedure  requires  the  specification  of  estimators  for  the  criterion  func- 
tion used  to  determine  the  optimal  number  of  instruments.  It  is  established  that  a  plug-in  estimator 
for  the  optimal  number  of  instruments  leads  to  a  GMM  estimator  that  is  fully  feasible  and  achieves 
the  same  asymptotic  distribution  as  the  infeasible  optimal  estimator.  We  also  propose  a  new  kernel 
weighted  version  of  GMM.  It  is  shown  that  the  asymptotic  bias  and  MSE  can  be  reduced  if  suitable 
kernel  weights  are  applied  to  the  moment  conditions.  For  this  purpose  a  new  rate-adaptive  kernel 
that  adjusts  its  smoothness  to  the  smoothness  of  the  underlying  model  is  introduced.  In  addition,  a 
data-dependent  way  to  pick  the  optimal  kernel  is  proposed. 

Finally,  a  semiparametric  correction  of  the  asymptotic  bias  term  is  proposed.  The  bias  corrected 
version  of  the  GMM  estimator  achieves  a  faster  optimal  rate  of  convergence  of  the  higher  order  terms. 

The  paper  is  organized  as  follows.  Section  2  presents  the  time  series  models  and  introduces  nota- 
tion. Section  3  introduces  the  kernel  weighted  GMM  estimator,  contains  the  analysis  of  higher  order 
asymptotic  MSE  terms  and  derives  a  selection  criterion  for  the  optimal  number  of  instruments.  Section 
4  discusses  implementation  of  the  procedure,  in  particular  consistent  estimation  of  the  criterion  func- 
tion for  optimal  bandwidth  selection.  Section  5  analyzes  the  asymptotic  bias  of  the  kernel  weighted 
GMM  estimator  and  introduces  a  data-dependent  procedure  to  select  the  optimal  kernel.  Section  6 
discusses  non-parametric  bias  correction.  Section  7  contains  a  small  Monte  Carlo  experiment.  The 
proofs  are  collected  in  Appendix  A.  Auxiliary  Lemmas  are  collected  in  Appendix  B  which  is  available 
upon  request. 


2.  Linear  Time  Series  Models 

We  consider  the  linear  time  series  framework  of  Hansen  and  Singleton  (1996).  Let  yt  €  R^  be  a  strictly 
stationary  stochastic  process.  It  is  assumed  that  economic  theory  imposes  restrictions  in  the  form 
of  a  structural  econometric  ecjuation  on  the  process  yt-  In  order  to  describe  this  structural  equation 
we  partition  yt  =  [y(,i,J/(  2'i't  s]-  Here,  y^j  is  the  scalar  left  hand  side  variable,  y^g  are  the  included 
and  yt  3  are  the  excluded  contemporaneous  endogenous  variables.  The  vector  Xt  is  defined  to  contain, 
possibly  a  subset,  of  the  lagged  dependent  variables  yt_i,  ...,yi_j.  where  r  is  known  and  fixed.  The 
structural  ecjuation  then  takes  the  form 

(2.1)  ■  yt.i=ao  +  P'oyt,2  +  P'i^t+£t- 

The  structural  model  also  imposes  restrictions  on  the  innovations  Ct-  More  specifically,  £t  is  strictly 
stationary  with  Eet  =  0  and  follows  a  Moving- Average  (MA)  process  of  order  m  —  1  for  m  >  1,  where 
again,  m  is  assumed  known  and  finite.  We  denote  the  autocovariance  function  of  £<  by  Yj  =  EetSt-j 
with  7^  =  0  for  Ijl  >  m. 

Letting  /3  =  [/3o,/3x]  G  K'^  and  collecting  all  the  regressors  in  X(  where  x't  =  [yj2j^^(]  we  can 
write  (2.1)  as  y^j  =  ao  +  /3'x(  +  £f  An  alternative  representation  of  (2.1)  is  obtained  by  setting 
a{L,/3)  =  ao  +  a\L  +  ...  +  arU  with  1  x  p  vectors  a^  such  that  a{L,fi)yt  =  ao  +  £(.  Note  that  a;  are 
subject  to  exclusion  and  normalization  restrictions  imphed  by  /?. 

In  addition  to  the  structural  equation  (2.1)  we  also  assume  that  the  reduced  form  of  yt  admits  a 
representation  as  a  vector  autoregressive  moving  average  (VARMA)  process  A{L)yt  —  A{l)i.iy  +  B{L)ut 
such  that  there  exists  an  infinite  moving  average  representation 

(2.2)  yt  =  ^iy  +  A-\L)B{L)ut. 

Here,  /x  G  M^  is  a  constant  and  Ut  is  a  strictly  stationary  and  conditionally  homoskedastic  martingale 
difference  sequence. 

In  order  to  completely  relate  model  (2.1)  to  the  generating  process  (2.2)  we  define  additional  p—1  xp 
matrices  of  lag  polynomials  Ai{L)  and  B\{L)  such  that  A\{L)yt  =  B\{L)ut  +  a\-  The  matrices  A\{L) 
and  Bi{L)  satisfy  [a'(L,/3),  ^'i(L)]'  =  .4(1),  \b' [L) .  B[{L)]'  =  B{L)  and  [cvo,a'i]'  =  A{\)iiy.  It  follows 
from  this  representation  that  the  structural  innovations  St  are  related  to  the  reduced  form  innovations 
by  St  =  b{L)ut  where  b(L)  =  60  +  biL  +  ...  +  bm-iL'^~^  and  6,  G  W  are  I  x  p  vectors  of  constants. 

We  assume  that  A{L)  and  B{L)  have  all  their  roots  outside  the  unit  circle  and  that  all  elements  of 
A{L)  and  B{L)  are  finite  order  polynomials  in  L.  Economic  theory  is  assumed  to  provide  restrictions  on 
the  polynomials  a{L,0)  and  b{L)  such  that  their  degrees  are  known  to  the  investigator.  No  restrictions 


are  assumed  to  be  known  about  the  polynomials  A\{L)  and  Bi{L).  In  particular  their  degrees  in 
L  are  unknown,  although  assumed  finite.   The  investigator  is  concerned  with  inference  regarding  the 
parameter  vector  /?  =  [Pq, I3[)  while  b{L),Ai{L)  and  B\{L)  are  treated  as  nuisance  parameters. 
The  economic  model  (2.1)  implies  moment  restrictions  of  the  form 

(2.3)  E{et+myt-j)  =  0  for  all  j  >  0. 

These  moment  restrictions  are  the  basis  for  the  formulation  of  GMM  estimators.  Alternatively,  the 
moment  restrictions  (2.3)  are  often  implied  by  economic  theory  and  then  lead  to  the  formulation  of  a 
structural  model  of  the  form  (2.1).  A  well  known  example  is  asset  pricing  models. 

In  addition  to  the  structural  restrictions  of  Equation  (2.1)  we  impose  the  following  formal  Assumf>- 
tions  on  zit  and  A{L),B{L)  and  b{L). 

Assumption  A.  Letut  G  W  be  strictly  stationary  and  ergodic,  with  E  {ut\lFt-i)  =  0,  E  {utu[\J^t-i)  = 
T,  where  E  is  a  positive  definite  symmetric  matrix  of  constants.  Let  ul  be  the  i-th  element  of  ut  and 
cumij_...^j^(ti,  ...,ifc_i)  the  k-th  order  cross  cumulant  oful]^^^,  ...,u\'' .  Assume  that 

oo  oo 

^    ••■      ^      |cum,j,,..,,Jii,...,^fc_i)|  <  oo  for  fc  <  8. 

£l  — —  oo         £fc_i=^— oo 

Assumption  B.  The  lag  polynomial  C{L)  with  coefRcient  matrices  Cj  is  defined  as  C{L)  =  A~^{L)B{L) 
where  A{L)  and  B{L)  are  p  x  p  matrices  of  finite  order  polynomials  in  L  such  that  detyl(z)  ^  0 
and  det B[z)  ^  0  for  \z\  <  1.  Moreover,  assume  b{z)  ^  0  for  \z\  <  1.  Let  pa  he  the  degree  of 
the  polynomial  A{L)  and  let  Ai  be  the  root  of  maximum  modulus  of  det{zP'^A{z~^))  =  0.  Let 
f,{X)  =  (27r)^^6(e''^)'i;5(e-'^)  which  can  eqiiivalently  be  written  as  f,(X)  =  (27r)~^  ct^  |g,(giA)|2  f^^ 
some  constant  a^  and  lag  polynomial  6  (L)  =  1  —  9\L  —  ...  —  9m-\L^~^.  Let  p^  be  the  degree  of  the 
polynomial  B{L)  =  6{L)B{L)  and  let  Ai  be  the  root  of  maximum  modulus  of  det{zPbB{z~^))  =  0. 
Define  A  =  max(Ai,Ai).  Assume  that  A  G  (0,1).  Define  the  infinite  dimensional  instrument  vector 
zj,^  =  (2/j,yj_j, ...)'  and  let  P'  =  Cov {xt+rm  ^t^oo)' ■  Assume  that  P  has  full  column  rank. 

Remark  1.  The  column  rank  assumption  for  P  is  needed  for  identification  (see  Kuersteiner  (2001) 
for  an  extensive  discussion  of  this  point).  Assumption  (B)  guarantees  that  /^(A)  ^  0  for  A  6  [— tt,  tt]. 
Then  l//e(A)  exists  and  corresponds  to  the  spectral  density  of  an  AR(m-l)  model. 

The  fact  that  ut  is  a  martingale  difference  sequence  arises  naturallj'  in  rational  expectations  models. 
In  our  context  it  is  needed  together  with  the  conditional  homoskedasticity  assumption  to  guarantee  that 


the  optimal  GMM  weight  matrix  is  of  a  sufficiently  simple  form.  This  allows  us  to  construct  estimates  of 
the  bias  terms  converging  fast  enough  for  bias  correction  and  optimal  number  of  instruments  selection. 

The  conditional  homoskedasticity  condition  E{utu[\J-t-i)  =  Eutu't  is  restrictive  as  it  rules  out  time 
changing  variances.  Relaxing  this  restriction  results  in  more  complicated  GMM  weight  matrices  of  the 
type  analyzed  in  Kuersteiner  (1997,  2001).  In  principle  the  higher  order  moment  restriction  implied  by 
conditional  homoskedasticity  could  be  used  in  addition  to  the  conditions  (2.3).  The  resulting  estimator 
is  however  nonlinear  and  will  not  be  considered  here. 

The  summability  assumption  for  the  cumulants  limits  the  temporal  dependence  of  the  innovation 
process.  Andrews  (1991)  shows  for  /c  =  4  that  the  summability  condition  on  the  cumulants  is  implied 
by  a  strong  mixing  assumption  for  Ut-  The  cumulant  summability  condition  used  here  is  similar  but 
slightly  stronger  than  the  second  part  of  Condition  A  in  Andrews  (1991).  What  is  needed  both  in 
Andrews  (1991)  and  here  are  restrictions  on  the  eighth-moment  dependence  of  the  underlying  process 

Ut. 

Infeasible  efficient  GMM  estimation  for  /?  is  based  on  exploiting  all  the  implications  of  the  moment 
restriction  (2.3).  In  our  context  this  is  equivalent  to  choosing  all  lagged  observations  cis  instruments. 
An  infeasible  estimator  of  (3  based  on  zj.ooj  where  2t,oo  is  defined  in  Assumption  (B),  is  used  as  a 
reference  point  around  which  we  expand  feasible  versions  of  the  estimator. 

For  this  purpose  let  Q.  =  Ylh=^m+i  Ij^iO  ^^'^''^  fl.{l)  =  Cov{zt^oo,  z[_i  ^)  and  D  =  P'Q~^P  where  P 
is  defined  in  Assumption  (B).  A  detailed  analysis  of  these  infinite  dimensional  matrices  can  be  found 
in  Kuersteiner  (2001)  and  Appendix  (B.2).  The  infeasible  estimator  of  0  is  given  by 

..    n~Tn 

/3„,^  =  D-'p'Q-'-  J2  {yt+m.i  -  a4)  (-t.oo  - 1,0  ®  ^iy) 
t=i 


y 


where  Iqo  is  an  infinite  dimensional  vector  containing  the  element  1  and  /uj  is  the  first  element  of  i^i 

Let  do  =  P'n~^-^  Yli^t,rx  —  loo  ®  m)^(  almost  surely  such  that  s/n  (/?„  ^o  ~  0o)  ~  D~^do.  It  can 
be  shown  that  D~^do  — +  N{0,D~^)  as  n  — >  oo  under  the  assumptions  made  about  yt  and  et- 

For  any  fixed  integer  M,  let  Zi^m  =  {y't^y't-i:  ■■■yt-M+iY  be  a  finite  dimensional  vector  of  instruments. 
An  approximate  version  /3„jv/  of  Z^n.oo  i^  then  based  on  Dm  =  P'^jQ^^Pm  and  zt^M  where  Pm  and  Q.m 
are  defined  in  the  same  way  as  P  and  Q  with  z^oo  replaced  by  Zt^M-  It  then  follows  that  ^/n  {^fi^  ^  —  /3)  — 
D~^do  — >  0  as  n,  Af  — »  oo.  The  last  statement  is  no  longer  true,  at  least  not  without  specifying  the  rate 
at  which  AI  goes  to  infinity,  once  i3„  j^f  is  replaced  by  a  feasible  estimator  /3„  m  where  $ri.M  is  defined 
in  the  same  way  as  (3„  ^  but  with  Pj\j  and  fi/i/  replaced  by  estimates  Pm  and  fi/i/.  We  call  0^  ^-y  a 
fully  feasible  estimator  if  M  is  a  function  of  the  data  alone.  A  more  detailed  definition  of  /?„  ^j  is  given 
in  Equation  (3.2)  while  data-dependent  selection  of  AI  is  discussed  in  Section  4. 


3.  Kernel  Weighted  GMM 

In  this  paper  a  generalized  class  of  GMM  estimators  based  on  kernel  weighted  moment  restrictions  is 
introduced.  Conventional  GMM  estimators  are  based  on  using  the  first  M  of  the  moment  restrictions 
(2.3).  More  generally  one  can  consider  non-random  weights  k{j,  M)  such  that 

k{j,M)Eet+myt-j-i  =  0. 

The  function  k{j,M)  is  a  generalized  kernel  weight.  For  the  special  case  where  k{j,M)  =  k{j/M), 
k{j,  M)  is  a  standard  kernel  function.  The  truncated  kernel  is  k{j/M)  =  {\j/M\  <  1}  where  we  use  {.} 
to  denote  the  indicator  function.  The  general  kernel  weighted  approach  therefore  covers  the  standard 
GMM  procedure  as  a  special  case  when  the  truncated  kernel  is  used.  In  Section  5  it  is  shown  that  many 
kernel  functions  reduce  the  higher  order  bias  of  GMM  and  that  there  always  exists  a  kernel  function 
that  dominates  the  traditional  truncated  kernel  in  terms  of  higher  order  MSE. 
We  now  describe  the  kernel  weighted  GMM  estimator  /3„jv/-  Define  the  matrix 

kM  =  diag(fc(0,  M), ...,  k{M  ~  1,  Af ))' 

having  kernel  weight  k{j  —  l,Af)  in  the  j'-th  diagonal  element  and  zeros  otherwise.  Let  Km  = 
{km  'S'  Ip)  where  Ip  the  p-dimensional  identity  matrix.  An  instrument  selection  matrix  5m  (0  = 
diag({t  >  1}  ,....{t  >  M])  is  introduced  to  exclude  instruments  for  which  there  is  no  data  in  the 
sample.  The  vector  of  available  instruments  is  denoted  by  zt^M  =  (5'm(0  ®  Ip)  [zt^M  "  1m  ®  v)  where 

?/  =  "    2^t=iyt- 

An  estimate  of  the  weight  matrix  Qm  is  obtained  as  follows.  We  define  Qi\i(l)  =  -  '^^  zi^^^'t-l  hi- 
The  optimal  weight  matrix  is  then  given  by 

m  —  \ 

(3.1)  hM=       Y.      7"(0^A/(/) 

where  7^(Z)  =  ^  Xl"=r"^m+i  ^t^t-i  and  et  —  a{L,  f^n,M)iyt+m  —  y)  for  some  consistent  first  stage  estimator 
PnM-  For  M  fixed  and  possibly  small,  it  is  well  known  that  such  an  estimator  can  be  obtained  from 
standard  inefficient  GMM  procedures  where  Q^  =  Imp- 

Let  Zm  be  the  matrix  of  stacked  instruments  Z^  =  [2inax(i, r-m+i), Mi  ■■-)  ^n-m,M]'  and  X  = 
[xniax(m+i,7-+i)  ~  x,...,i„  —  x]'  the  matrix  of  regressors.  Also,  Y  is  the  stacked  vector  of  the  first 
demeaned  element  in  y<.  Then  define  the  d  x  Mp  matrix  P]^,;  =  n~^X'Zf,;  as  well  as  the  Mp  x  1  vector 
^M  ~  n~^Zi^jY.  Let  Em  =  Km^~\}Km-  Assuming  that  M  is  such  that  M  >  d/p,  where  d  is  the 
dimension  of  the  parameter  space,  the  estimator  /?„  m  can  now  be  written  as 

(3.2)  K,M  =  (P'm^mPm)  "'  P'm^mP'm- 
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For  the  truncated  kernel  with  Km  =  Ihip,  (3.2)  is  the  standard  GMM  formula.  The  effects  of  using 
kernel  weighted  moments  can  be  inferred  from  (3.2).  The  kernel  matrix  A'a/  distorts  efficiency  by  using 
Hjv/  instead  of  the  optimal  hjj  as  weight  matrix.  As  is  shown  below,  these  effects  are  second  order 
for  suitable  choices  of  the  kernel  function  k{j,  M)  and  bandwidth  M.  It  is  also  shown  that  the  second 
order  loss  of  efficiency  is  more  than  compensated  by  a  reduction  of  the  second  order  bias  for  suitably 
chosen  kernel  functions. 

We  now  turn  to  the  formal  requirements  the  kernel  weight  function  k{.,.)  has  to  satisfy.  We  first 
define  the  constant  s  which  plays  a  role  in  determining  the  rate  of  convergence  of  /3„  ^^  to  /3„  ^. 

Definition  3.1.  Let  \\  and  Ai  be  as  defined  in  Assumption  (B).  Let  si  be  the  multiplicity  of  Ai  and 
si  the  muhipUcity  of  Aj.  Define  s  =  2si  —  1/2  if  Aj  >  Ai,  s  =  2s\  +  3/2si  —  1/2  if  Ai  =  Ai  and 
s  =  3/2si  -  1/2  if  Ai  <  Ai. 

Assumption  C.  The  kernel  function  k{j,M)  is  regular  if  k{j,AI)  =  k{j/M).  Then  k{.)  satisfies  k  : 
Mh^  [-1, 1] ,  A:(0)  =  1,  k{x)  =  k(-x)\/x  G  M,  k{x)  =  0  for  |x|  >  1.  k{.)  is  continuous  at  0  and  at  all  but  a 
hnite  number  of  points.  For  q  e  (0,  oo)  there  exists  a  constant  kq  such  that  kg  =  lim3;_o(l  —  ^(i'))/  1^1'  ■ 
We  distinguish:  i)  kq  =  0  for  all  q  €  (0,  oo),  ii)  kq  ^  0  for  some  q  6  (0,  oo). 

Assumption  D.  The  kernel  function  k{.. .)  is  rate-adaptive  if  it  satisfies  \k  (x,  y)|  <  c  <  oo  V  (x,  y)  £ 
K  X  1R+,  k{-x,y)  =  k{x,y),  k{0,y)  =  1  for  all  ij  eR+  and  k{x,y)  =  0  for  \x\  >  1.  Fiirtfiermore,  for  A 
defined  in  Assumption  (B)  and  cq  =  —  log(A),  limy_»oo  (1  —  k{x/y.  y*"A^))  /y^X^  =  CqCj  |x|  for  all  x  &M. 
and  some  constant  ci . 

Assiimption  (C)  corresponds  to  the  assumptions  made  in  Andrews  (1991)  except  that  we  also 
require  A:(x)  =  0  for  |x|  >  1.  This  assumption  ensures  that  only  a  finite  number  of  moment  conditions, 
controlled  by  the  bandwidth  parameter,  are  used  in  estimation.  The  assumption  could  be  relaxed  at  the 
cost  of  having  to  introduce  additional  bandwidth  parameters  to  estimate  the  optimal  weight  matrix. 
This  seems  unattractive  from  a  practical  point  of  view  and  is  not  pursued  here. 

Assumption  (C)  rules  out  certain  parametric  kernel  functions  such  as  the  Quadratic  Spectral  kernel 
but  is  satisfied  by  a  number  of  well  known  kernels  such  as  the  Truncated,  Bartlett,  Parzen  and  Tukey- 
Hanning  kernels. 

In  the  case  of  regular  kernels  the  constant  kg  measures  the  higher  order  loss  in  efficiency  induced  by 
the  kernel  function  which  is  proportional  to  1  —  k{i/M)  =  kqM~''  |i|'  for  large  M  and  some  q  such  that 
kq  ^  0.  For  rate-adaptive  kernels  we  obtain  an  efficiency  loss  of  1  —  k{i/M,M^X  )  =  M^X  CQ\i\'^  kq 
which,  as  will  be  shown,  is  of  the  same  order  of  magnitude  as  the  efficiency  loss  due  to  truncating  the 


number  of  instruments.  The  kernels  are  called  rate-adaptive  because  their  smoothness  locally  at  zero 
adapts  to  the  smoothness  A  of  the  model.  In  Section  4  it  is  shown  how  the  argument  M^\^'  in  k{., .) 
can  be  replaced  by  an  estimate. 

Kernel  functions  that  satisfy  Assumption  (D)  can  be  generated  by  exploiting  the  chain  rule  of 
differentiation.  Consider  functions  of  the  form 

(3.3)  .  4^{v,z)  =  {2~z{-\og{z)Y)v  +  {z{-\og{z)f-l)v' 

for  V  6  [—1,1]  and  some  non-negative  integer  q.  Then  0(1,2)  =  l,(f>{Q,z)  =  0,9(jf)(u,  z)/9u|^^j  = 
z(— log(2))''  for  all  z.  It  thus  follows  that  z  parametrizes  the  partial  derivative  of  (j)  with  respect  to  v. 
The  constant  q  is  chosen  in  accordance  with  a  kernel  k{x)  to  which  ^(u,  z)  is  applied  and  for  which 
kq  7^  0.  The  rate  adaptive  kernel  /c(j,  M)  is  then  obtained  as 

(3.4)  k{3,M)  =  cl){k{j/M),NF\^) 

for  any  kernel  k{j/M)  satisfying  Assumption  (C).  It  is  shown  in  the  proof  of  Proposition  (3.4)  that 
\imM-*oc  {I  -  4>{H^/^^)^  M'\^''))  /M'X'^'  =  \i\''kqcl.  Also  note  that  ^  4>{k{x),NP\'^^'fdx  <  oo  uni- 
formly in  M  G  (0,oo]  as  long  as  A  6  (0, 1)  and  J  k{x)'^dx  exists.  The  latter  is  the  case  for  all  kernels 
satisfying  Assumption  (C).  We  use  the  short  hand  notation  /  (j){x)'^dx  =  \im.j[j^^  J  (p{k{x),M^\  )'^dx. 
The  bandwidth  parameter  M  is  chosen  such  that  the  approximate  MSE  of  a  weighted  sum  of 
the  elements  of  /3„  ^/  is  minimized.  We  use  the  Nagar  (1959)  type  approximation  to  the  MSE  used  in 
Donald  and  Newey  (2001).  Let  ^„  f^j  be  stochastically  approximated  by  bn,M  such  that  n^'^(^„  m~P)  = 
bn,M  +  Tn^M  where  rn,M  is  an  error  term.  Define  the  approximate  mean  squared  error  (/?„(M,  i,  k{.))  of 
P^M  as  in  Donald  and  Newey  (2001)  such  that  for  I  gW^  with  £'l  =  1, 

e'D'/^E  {bnMb'nM)  D^'^'^  =  1  +  "PniM,  ^,  k{.))  +  i?„,M 

and  require  that  the  error  terms  r^  jv/  and  Rn,M  satisfy 

(3.5)  -— — — —  =  Op(l)  as  M^oo,ri^oo,  VAf/n^O. 

(/j„(M,£,fc(.)) 

The  only  difference  to  Donald  and  Newey  (2001)  is  that  in  our  case  (/j„(M,  £, /c(.))  is  an  unconditional 

expectation.  As  noted  by  Donald  and  Newey,  the  approximation  is  only  valid  for  M  -^  oo.  Given  the 

efficiency  of  /?„  ^  this  is  the  case  of  interest  in  the  context  of  this  paper. 

Proposition  3.2.  Suppose  Assumptions  (A)  and  (B)  hold  and  i  eM'^  with  i'l  =  1.  Let  T^^^  =  EetXs 
and  define  f,,{X)  =  ^  Ej" -oor^'e-^^i.  Define 

(3.6)  Ai=piAn)-'   r  /,,(A)7-i(A)dA. 

J  —  n 


Let  A  =eD-^/'^X^AiD-^'-i.  and  Bx  =  limM^oo  (1  -  (Jim)  I  (M'^A^^^)  with 

Also,   5(9)    =   \ix^M^^02Ml{M'^'y?^^)    where  a2M   =   i'D-y'%;nMbMD''/'-l,  b\j   =   Qm{Imj,  ~ 
Pm  [P'm^iIPmY'  P'm^m)  ^nd  Qm  =  P'^i  {Imp  -  Km)  n^j  +  P'.p-Jihip  -  Km)-  Then, 
i)  for  k{.)  such  that  Assumption  (Ch)  is  satisfied  it  foUows  that 

^„{MJ,k{.))  =  0{M^  In)  +  0{M-^''). 

ii)  for  A:(., .)  defined  in  (3.4)  such  that  Assumption  (D)  holds,  n,M  -^  00  and  Ap-^~~X"    n  — >  1/k  with 

0  <  K  <  CO, 

\imn/M^^„{M.l,k{.))  = -^  (  I      (p{x)'^dx\    +cl'^k^^B^'^^ /k  +  Bi/k. 

Hi)  for  k{x)  =  {|i|  <  1},  n,  7\/  ->  00  and  M~-'-'^ X^^' n  -»  1/k  with  0  <  k  <  00, 

Yimn/M'^ip^iM,e,k{.))  =  AA  +  Bi/k. 

n 

This  result  shows  that  using  standard  kernels  introduces  variance  terms  of  order  0{M~'^'^)  that  are 
larger  than  for  the  truncated  kernel.  Using  the  rate-adaptive  kernels  overcomes  this  problem,  leading 
to  variance  terms  of  the  same  order  as  in  the  standard  GMM  case.  Nevertheless,  kernel  weighting 
introduces  additional  variance  terms  of  order  A/'^'A  .  Intuitively,  the  kernel  function  distorts  the 
optimal  weight  matrix  resulting  in  an  increased  variance  of  higher  order  terms  in  the  expansion.  As 
will  be  shown,  this  increased  variance  can  be  traded  off  against  a  reduction  in  the  bias  by  an  appropriate 
choice  of  the  kernel  function.  Since  any  kernel  other  than  the  rate-adaptive  kernels  lead  to  slower  rates 
of  convergence,  only  rate-adaptive  kernels  will  be  considered  from  now  on. 

An  immediate  corollary  resulting  from  Lemma  (3.2)  is  that  the  feasible  estimator  has  the  same 
asymptotic  distribution  as  the  optimal  infeasible  estimator  as  long  as  M / ^/n  -^  0. 

Corollary  3.3.  Assume  that  the  assumptions  of  Lemma  (3.2)  hold.  If  n.  A/  — >  oc  and  Mj^/n  ^  0  as 
n  — ►  00  then  s/ii  iPnj^j  -  P)  —  D^^do  =  Op(l). 

Because  B^'''  and  Bi  are  difficult  to  estimate  we  do  not  minimize  ifjj{AI,l,k(.))  directly.  Instead  we 
propose  the  following  criterion. 

Proposition  3.4.  Let  M*  mimmize  (p„{MJ.k{.)).  Then,  as  n  -»  oc.  M*/M*  —^  1  where 

(3.7)  i\/*  =  argminMIC(A/)  =  argmin^ — A\   /       (i)(x)'^dx]    -logaM 

M€l  Me  I        "  V./-00  / 
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with  I  =  {[d/p]  +  l,[d/p]  +  2, ...}  and  [a]  denotes  the  largest  integer  smaller  than  a.  Here  gm  = 
0"1M  —  cr2M  with  aiM,  cr2M  defined  in  Proposition  (3.2).  If  in  addition,  for  some  constant  ci,  logaiM  = 
ciM^^A^^  +  o{Xl^)  where  A,  satisfies  0  <  A,  <  A^/^  then  M*/M*  -  1  =  0(n-i/2  (logn)^/^). 

Remark  2.  By  construction,  fi^^M'  -"^  higher  order  efEcient  under  quadratic  risk  in  the  class  of  all 
GMM  estimators  Pn,M>  Mel. 

Remark  3.  For  the  truncated  kernel  f^  4>{x)'^dx  =  2  and  C2m  =  0.  Note  that  aiM  measures  the 
second  order  loss  of  efficiency  caused  by  a  finite  number  of  instruments.  This  result  follows  from  the 
fact  that  (P'j^^jQ.^ Pm)  is  the  asymptotic  variance  matrix  of  the  estimator  based  on  M  instruments. 
Then  aiM  has  the  interpretation  of  a  generalized  measure  of  relative  efficiency  of  (3^^.  It  corresponds 
to  N-'^a'^H-'^f'{I  -  P'')fH-'^  of  Donald  and  Newey  (2001,  their  notation).  The  term  a2M  measures 
the  additional  loss  in  efRciency  due  to  the  kernel.  The  constant  A  measures  the  simultaneity  bias 
caused  by  estimates  of  the  optimal  instruments.  When  rn  =  1  such  that  Et  is  serially  uncorrelated, 
Ai  =  ^a~^i' D~^''^airx.  Noting  that  here  Mp  is  the  total  number  of  instruments,  it  can  be  seen  easily 
that  when  m  =  1  the  penalty  term  essentially  is  the  same  as  in  Donald  and  Newey  (2001). 

Remark  4.  It  is  shown  in  the  proof  of  Proposition  (4.3)  that  M*  =  logn/  (2co)  +o(logn)  and  asymp- 
totically does  not  depend  on  A,  B^'^\  B\  or  k  up  to  order  o(logn).  If  the  result  M*/M*  —  1  =  o(l)  is 
sufficient  then  it  is  possible  to  replace  MIC(M)  by  the  simpler  criterion  n~^  (Mp)  —  loguiM  where 
Mp  is  the  total  number  of  instruments.  Note  that  this  simplification  is  allowed  only  for  models  where 
logaiM  decays  at  an  exponential  rate.  This  is  the  case  for  the  ARMA  class. 

Remark  5.  A  further  simpUfication  is  available  if  the  deBnition  of(pj^(M,i,k(.))  is  based  on 

ixD^''^EhnMb'n,MD^''^ 

instead  of  £'D^^^E{bn^Mb'^  j^^)D^^^'£.  Then  the  approximate  MSE  depends  on  tiaiM  =  log|CTiM|  = 
log  \P'j^Q'^Pm\  -  log  |D|  where  a\M  =  D~^/'^ P'j^^W^ Pm D"^/^ .  The  simpUfied  criterion  for  this  case  is 
n~^  (Mp)  +  log  |P|v^fi^  Pm\  ^^d  knowledge  of  the  variance  lowerbound  D~^  is  no  longer  required. 
Note  that  this  formulation  of  MIC(M)  is  quite  similar  to  Hall  and  Inoue  (2000)  except  for  the  penalty 
term  {Mp)~  /n. 

4.  Fully  Feasible  GMM 

In  this  section  we  derive  the  missing  results  that  are  needed  to  obtain  a  fully  feasible  procedure.  In 
particular  one  needs  to  replace  the  unknown  optimal  bandwidth  parameter  M*  by  an  estimate  M*.  In 
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order  to  have  a  fully  feasible  procedure  we  need  consistent  estimates  of  the  constants  A\,D  and  (Jm, 
converging  at  sufficiently  fast  rates. 

The  following  analysis  shows  that  estimation  of  ^i  can  be  done  nuisance  parameter  free  in  the  sense 
that  consistent  estimates  of  Ai  do  not  depend  on  additional  unknown  parameters.  Unfortunately  the 
same  is  not  true  for  D  and  a^-  We  use  an  approximating  parametric  model  for  C{L)  to  estimate  D 
and  a^. 

We  first  consider  the  simpler  estimation  problem  for  the  constant  ^i  where 

j  =  — oo 

Note  that  C  are  the  coefficients  in  the  series  expansion  of  f~^  (A) . 

Consistent  estimates  of  the  MA(m-l)  representation  of  €t  can  be  obtained  by  using  consistent 
estimates  of  the  parameter  P  to  obtain  estimates  it-  An  MA(m-l)  model  is  then  estimated  for  it-  This 
can  be  done  by  using  a  nonlinear  least  squares  or  pseudo  maximum  likelihood  procedure  as  described 
in  chapter  8  of  Brockwell  and  Davis  (1991).  This  procedure  is  outlined  in  the  proof  of  Lemma  (4.1). 
Because  of  the  exponential  decay  of  Cj  and  the  fact  that  m  is  finite,  F"  can  be  replaced  by  a  simple 
sample  average  based  on  estimated  residuals  F^^  =  n~^  St^minf  '+'"1/  ^t+m^t-j-  Using  these  estimates 
one  forms  ^1  by 

(4.1)  -^1  =  1       E      '^j'^T-m- 

j=-n+l 

To  summarize,  we  state  the  following  proposition. 

Proposition  4.1.  Let  Assumptions  (A)  and  (B)  be  satisfied.  Let  A\  he  defined  in  (4.1).  Tlwn 
^{Ai--Ai)=Op{l). 

We  use  a  finite  order  VAR(h)  approximation  to  C{L)~^  =  C{L)  with  C{L)  =  YLT=o  CjL^  to  estimate 
the  parameters  D  and  G^].  It  follows  that  yt  =  Cq  C{\)^y  +  X^jlj  TTjyt-j  +  vt  where  ttj  =  Cq^Cj  and 
Vt  =  Cq  Ui.  Let  7r(L)  =  I  —  YI'tLi  ""j-^  and  Evtv[  =  Ei,.  The  approximate  model  with  VAR  coefficient 
matrices  tti/j,  ...tt/iM  is  then  given  by 

(4-2)  yt  =  fiyj^  +  TTi^hyt-i  +  ...  +  nh.hVt-h  +  Vt.h 

where  T,yh  =  Evt^h'u't  h  '^  ^^^  mean  squared  prediction  error  of  the  approximating  model.  VAR  ap- 
proximations in  a  bandwidth  selection  context  was  proposed  by  Andrews  (1991).  There  however,  h  is 
kept  fixed  such  that  the  resulting  bandwidth  choice  is  asymptotically  suboptimal.  We  let  h  — >  00  at  a 
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data-dependent  rate,  to  be  described  below,  leading  to  the  approximating  model  being  asymptotically 
equivalent  to  7r(L). 

It  was  shown  by  Berk  (1974)  and  Lewis  and  Reinsel  (1985)  that  the  parameters  (tti /i,  ...,7r/j_/i)  are 
root-n  consistent  and  asymptotically  normal  for  7r(/i)  =  (tti,  ...,7r/i)  if  h  does  not  increase  too  quickly, 
i.e.  if  h  is  chosen  such  that  h  /n  — >  0.  At  the  same  time  h  must  not  increase  too  slowly  to  avoid 
asymptotic  biases.  Berk  (1974)  shows  that  h  needs  to  increase  such  that  n^^'^'}2i=h+i''^j  ~*  0  as 
h,  n  -^  DO.  Ng  and  Perron  (1995)  argue  that  information  criteria  such  as  the  Akaike  criterion  do  not 
satisfy  these  conditions  and  can  therefore  not  be  used  to  choose  h.  Moreover,  the  results  of  Hannan 


and  Kavalieris  (1984,  1986)  imply  that  if  h  is  selected  by  AIC  or  BIC  then  h  ~  h  =  Op{\/logn)  and  h 
fails  to  be  adaptive  in  the  sense  of  Ng  and  Perron  (1995). 

To  avoid  the  problems  that  arise  from  using  information  criteria  to  select  the  order  of  the  ap- 
proximating model  we  use  the  sequential  testing  procedure  analyzed  in  Ng  and  Perron  (1995).  Let 
nih)  ^  (tt'i,  ...,<)' ,  Ft,,,  =  {y[  -  y' ,  ...,y[_^+,  -  y')'  and  A^  =  Et=/z+i  i'*-!,'.^'/-!,/.  and  define  Mf;\l) 
to  be  the  lower-right  p  x  p  block  of  Mf[^.  Let  Ffc  be  the  kp  x  kp  matrix  whose  (m,n)th  block  is  T^_^ 
and  r'j  ^  =  [r^^j,  ...,r^''^]  where  F^^^  =  Cov {yt-i,y[_j).  The  coefficients  of  the  approximate  model 
satisfy  the  Yule- Walker  equations  {ni^h,  ■■■,'^h,h)  =  ri,/,r^^  Let  fi,/,  =  (n  -  h)"-^  J2t=h  ^t,h  ivt+i  -  y) 
and  F/,  =  (n  —  h)~  Y^=h  ^t,hYth-  '^^^  estimated  error  covariance  matrix  is  S„  /^  =  n~^  Y17=h+i  ^t.h^'f  ^ 
where  vt^h  =  Ut  —  7i"i,/i2/t-i  —  •••  —  ^h,hyt-h  with  coefficients  Tr{h)'  =  Fj  ^^F^  .  A  Wald  test  for  the  null 
hypothesis  that  the  coefficients  of  the  last  lag  h  are  jointly  0  is  then,  in  Ng  and  Perron's  notation, 

J{h,h)  =n{vecnh,h)' \ty^h^Mf^^{l)j      {vecjTh^h)  ■ 

We  adopt  the  following  lag  order  selection  procedure  from  Ng  and  Perron  (1995). 

Definition  4.2.  The  general-to-specific  procedure  chooses  i)  hn  =  h  if,  at  significance  level  a,  J{h,  h) 
is  the  first  statistic  in  the  sequence  J{i,  z),  {i  =  ftmax,  -•■^  l}i  wliich  is  significantly  different  from  zero  or 
ii)  h„  =  0  if  J{i,  i)  is  not  significantly  different  from  zero  for  all  i  =  /i-max;  •••:  1  where  /imax  is  such  that 
^max/"-  -*  0  ^"cf  n^/2  Ei=h„,,<  +  i  IKjII  -^  0  as  n  ^  do. 

In  order  to  calculate  the  impulse  response  coefficients  associated  with  (4.2)  define  the  matrix 


Ah 


I         0         ■•■     0 
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with  dimensions  hp  x  hp.  The  j-th  impulse  coefficient  of  the  approximating  model  is  given  by  Cj^h  = 
E'l^Aj^Efi  with  E'l^  =  [/p,0,  ...,0] .  The  autocovariance  function  F^  is  then  approximated  by  F^^  = 
Yl'iZo^i+jA'^y.h^'i  h  f^"^  ^^'  ^  ~  1)2,....  Likewise  we  approximate  the  optimal  weight  matrix  by  the 
infinite  dimensional  matrix  Sl^  =  Y7P=-m+i^^ (J)^h{j)  where  the  infinite  dimensional  matrix  0/i(j) 
has  typical  k,  l-th  block  Tf^f._- 1^.  We  denote  the  k,  l-th  block  of  fi^^  by  -dkj^h-  We  define  Dh  by  letting 

Estimates  Cj^h  —  E'/^Aj^Eh  of  Cj^h  are  obtained  by  substituting  n{h)  for  7r(/i)  in  A^  such  that  A/, 
is  defined  in  the  obvious  way.  Substituting  estimates  Cj /,  for  Cj_h  in  D^  leads  to  an  estimate  D^.  A 
fully  feasible  version  Z)^  is  obtained  by  replacing  h  with  /i  as  defined  in  Definition  (4.2).  In  the  proof 
of  Proposition  (4.3)  it  is  shown  that  -yn-consistency  of  n{h)  implies  D:  —  D  =  Op{n~^^'^).  Next,  am 


h       ^  ~  ^P\ 


is  approximated  by  aM,h  in  a  similar  way.  We  use  the  approximate  autocovariance  matrices  F^^  to 
form  the  possibly  infinite  dimensional  matrices  P^;;,  =  ^\\^  ■■■^^x;  t,  where  P^)  h  '^  defined  in  the 
obvious  way.  We  then  form  the  matrix  QM,h  =  P'm^h  i^Mp  ~  ^m)  ^a/,/i  +  ^MM^M'^h  (-^^p  ~  ^m)  ^^^ 
^M,h  =  QMAihip  -  P'mm  {P'M.h^MjiPM.hj      Kj jP-'hIh) '  The  parameter  aM,h  is  obtained  from 


(4-3)  (JM,h  =  (^\M,h  -  (^-ZM.h  with  CT2A/,ft  =  i'D^        bM,h^M,kb'MjiD^        £ 

where  Dh  =  P^j,^^  Poo,/i  and  ctim./i  =  ^'D,^  P'M,h^M,hPM,hDh  ^-  ^'^  estimate  of  aiA/.h  is  based 
on  Ti{h)  in  the  same  way  as  described  above  and  i/n-consistency  uniformly  in  M  is  implied  by  the  result 
for  Df^.  In  practice,  infinite  dimensional  matrices  such  as  P'^  ^  and  0.^  have  to  be  replaced  by  finite 
dimensional  matrices.  The  resulting  approximation  can  be  made  arbitrarily  accurate  by  an  appropriate 
choice  of  the  relevant  dimensions  without  affecting  the  convergence  results. 

We  now  define  the  estimate  ^j^^j^  where  we  replace  K\j  in  b./i//,  by  Km  with  typical  element 


4>{k{i/M),  .. /logCTjyj^^).  Moreover,  we  replace  all  elements  in  (4.3)  with  corresponding  estimates  based 

on  fr(/i).  In  Proposition  (4.3)  we  establish  that  (a ^^j-^- om\  /  [M'^'X^^^]  =  Op{n~^/'^  {\ognY^'^+'') 
uniformly  in  M  where  s'  is  defined  in  Proposition  (4.3).  Establishing  this  result  requires  a  stronger 
form  of  uniform  convergence  of  the  approximating  parameters  fr{h)  than  was  needed  to  show  D;  —  D  = 


h 

is  defined  in  (4.1).  Define  M*  as 


Op(n   ^'•^),  thus  explaining  the  slower  rate  of  convergence.  Also  let  ^  =  .f'L'f       A.-,A\Dt       ^  where  >1i 

h  h 


^/2    .    /    TOO  \2 


(4.4)  A/*  =  argmin A[    /      (f){x)^dx\    -loga^^^. 

The  following  result  can  be  established. 
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Proposition  4.3.  Let  M*  be  defined  in  (4.4)  and  M*  as  in  (3.7).  Tlien  (m*/M*  -  l)  =  Op{l).  If  in 
addition  logaiM  =  ciM^'X^^''+o{X^/'')  with  0  <  A^  <  A^/^  ^^jg^  (m*/M*  -  l)  =  Oj,{n-'^/^  (logn)'/2+'') 
with  s'  =  si  |Ai  >  Ai|  +  Si  |Ai  <  Aij  . 

Remark  6.  It  is  always  the  case  that  \oga\M  =  cjM'^^A  +o(A^  )  for  some  Xj-  such  that  0  <  A^  <  A. 
Here  we  require  that  Xr  is  not  too  close  to  A,  ie.  that  the  remainder  term  disappears  sufficiently  fast. 

Ultimately,  one  is  interested  in  the  properties  of  a  fully  automated  estimator  /?  ,>.  where  the  data 
determined  optimal  bandwidth  M*  is  plugged  into  the  kernel  function  and  the  data-dependent  kernel 
<p{k{i/M*),  . /log<7jy^;y.  ^)  is  used.  In  order  to  analyze  this  estimator  we  need  an  additional  Lipschitz 
condition  for  the  class  of  permitted  kernels. 

Assumption  E.  The  kernel  k{j/A4)  satisBes  \k{x)  —  k{y)\  <  ci  \x'^  —  y'^\  Vx,  y  £  [0, 1]  for  some  ci  <  oo 
and  q  >  1. 

Assumption  (E)  corresponds  to  the  assumptions  made  in  Andrews  (1991).  Using  the  previous 
results  we  are  now  in  a  position  to  state  one  of  the  main  results  of  this  paper  which  establishes  that 
an  automated  bandwidth  selection  procedure  can  be  used  to  pick  the  number  of  instruments  based  on 
sample  information  alone. 

Theorem  4.4.  Suppose  Assumptions  (A)  and  (B)  hold  and  ether  i)  fc(., .)  is  defined  in  3.4  and  satisfies 
Assumptions  (D)  and  k{.)  used  as  an  argument  in  (p{., .)  satisfies  Assumptions  (Cii)  and  (E)  where  (p{., .) 
is  as  defined  in  (3.3)  or  ii)  k{., .)  =  {|xi  <  1}.  Assume  that  logaiM  =  ciM'^^X^^'  +  o(A^")  with  0  < 
A^  <  A3/2.  Let  M*be  defined  in  (4.4)  then  for  case  i)  n/^/W0„  a/-  -  K.M^)  =  Op((logn)™^''(^''"«'"^)) 
and  for  case  ii)  n/VM*(/3^  ^;^,  —  fin.M')  —  Op(l)-  Also,  if  i)  and  s'  —  q  <  0  or  ii)  holds  then 

Remark  7.  Under  the  conditions  of  the  Theorem,  M*  can  he  replaced  hy  M*  in  the  last  display  as 
well  as  in  Proposition  (4.3). 

Remark  8.  Although  the  constant  s'  is  typically  unknown  it  is  often  reasonable  to  assume  that  s'  =  1 
which  requires  that  Aj  7^  Ai  and  there  are  no  multiplicities  in  the  largest  roots. 

Theorem  (4.4)  shows  that  using  the  feasible  bandwidth  estimator  M*  results  in  estimates  j3^  ^^j, 
that  have  asymptotic  mean  squared  errors  which  are  ecjuivalent  to  asymptotic  mean  squared  errors  of 
estimators  where  a  nonrandom  optimal  bandwidth  sequence  M*  is  used.  An  immediate  consequence 
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of  the  Theorem  is  also  that  P^  j^,j,  is  first  order  asymptotically  equivalent  to  the  infeasible  estimator 
D~^do.  The  theorem  however  shows  in  addition  higher  order  equivalence  of  P^j(,j.  and  P„,M'-  In  this 
respect  Theorem  (4.4)  is  stronger  than  Proposition  4  of  Donald  and  Newey  (2001). 

5.  Bias  Reduction  and  Kernel  Selection 

In  this  section  we  analyze  the  asymptotic  bias  of  /3„  n,j  as  a  fimction  of  the  sample  size  n  and  the 
bandwidth  parameter  A'l. 

Theorem  5.1.  Suppose  Assumptions  (A)  and  (B)  hold  and  k{., .)  satisfies  Assumption  (D).  IfM  -^  oo 
and  M/n^/'^  -^  0  then 

lim  ^/^IME[bn  m  -  P)  =  D'^A[   f  <p'^{x)dx. 

n^oo  '  J 

Remark  9.  This  result  also  holds  for  kernels  satisfying  Assumption  (C)  if  f  (p  {x)dx  is  replaced  by 
j'  k'^{x)dx. 

A  simple  consecjuence  of  this  result  is  that  for  many  standard  kernels  the  asymptotic  bias  of  the 
kernel  weighted  GMM  estimator  is  lower  than  the  bias  for  the  standard  GMM  estimator  based  on  the 
truncated  kernel. 

Corollary  5.2.  Suppose  Assumptions  (A)  and  (B)  hold  and  fc(., .)  satisRes  Assumption  (D)  with 
^  (f{x)dx  <  2  or  Assumption  (C)  with  [  k'^{x)dx  <  2.  If  n,  M  -»  oo,  M/n^/"^  ->  0  then 

lirn^  ||v^/ME(5„,M  -  ,5)11  <  Jim^  \\V^/ME{bl,,  -  /3)|| 

where  b^  f^  is  the  stochastic  approximation  to  the  GMM  estimator  based  on  the  truncated  kernel. 

It  can  be  shown  easily  that  substituting  well  known  kernels  such  as  the  Bartlett,  Parzen  or  Tukey- 
Hanning  in  4>(v,z)  leads  to  j  (p  {x)dx  being  equal  to  16/15,  67/64  and  1.34  respectively. 

We  now  turn  to  the  question  of  optimality  in  the  higher  order  MSE  sense  of  the  choice  of  kernel 
function.  Let  k*  =  Hml/  ( A^^^7'Af^2s-2  J  ^here  M^  is  optimal  for  the  truncated  kernel.  Note  that 
by  optimality  of  il/^,  0  <  k*  <  oo.  From  Proposition  (3.2)  it  follows  at  once  that  any  kernel  for 
which  A  (  [  [_^  (f>{x)~dx)  -  4  |  +  c^k'^B^''^ / k*  <  0  dominates  the  truncated  kernel.  In  Theorem  (5.3) 
a  simple  variational  argument  is  used  to  show  that  we  can  always  find  a  kernel  k{., .)  such  that  this 
inequality  is  satisfied  imiformly  on  a  compact  subset  of  the  parameter  space.  This  result  raises  the 
question  of  finding  an  optimal  or  at  least  dominating  kernel.  When  g  =  2  is  fixed  this  problem  has  been 
solved  for  standard  kernels  by  Priestley  (see  Priestley,  1981  p. 569).    In  our  context  of  rate-adaptive 
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kernels  with  fully  flexible  q  it  is  not  known  whether  closed  form  solutions  of  the  associated  functional 
optimization  problem  exist.  In  any  event,  such  solutions  most  likely  depend  on  the  constant  B^"^^  which 
is  difficult  to  estimate. 

To  avoid  these  complications  we  propose  the  following  data-dependent  solution  to  the  optimal  kernel 
selection  problem.  Let  <f>{k{x),  M^X^)  be  as  described  before.  Because  (p  (•,  •)  enforces  adaptiveness  of 
the  kernel  we  only  choose  k{x),  which  by  the  Weierstrass  Theorem  can  be  approximated  by  polynomials. 
Let  r  be  a  finite  integer.  Define  k{x)  =  1  +  X][=i  '^i^^  ^'^^  ^  ^  [0, 1),  k{x)  =  k{\x\)  for  x  <  0  and  k{x)  =  0 
if  |x|  >  1.  Let  ip  =  (i/'j,  ■■■,ip^)  ■  Then  for  j  =  1, ...,  r  define  U^  c  M'^,  such  that  C/1  is  compact  and  for 
ip^  =  [ip-^^,  ..,tpj_i,0,ipj^i,..,tpT.),  it  follows  ili^  ^  C/^.  Also  let  IC^  =  {|x|  >  1}.  The  permissible  class 
of  kernels  is  ICj=  K.^  U  fC^  where  for  j  >  1,  /C^  =  h{x)\k{x)  =  1  +  J^'^^^  ipiX',  k{x)  e  [-1,  l],ipe  C/^| 
and  it  is  understood  that  k{x)  also  satisfies  the  restrictions  outlined  before.  The  optimal  kernel  k*{x) 
with  (f){k{x))  =  2k{x)  —  /c(x)^  satisfies 

(5.1)  a(  r  (f){k*{x)fdx]    +k*?B^'^'^/K*  <a(  I      (l>{k{x))^dx]    +  A;2b('?)/k*,  all    k{.,.)eICg 

\J-oo  J  \J-oo  J  for  all  q>q' ,q'>l 

Note  that  optimality  is  pointwise  in  A  and  B'''  which  means  that  in  general  k*  depends  on  A  and 
B^'^'.  It  will  be  shown  that  (5.1)  is  a  reasonable  optimality  criterion  because  one  of  the  main  objectives 
in  this  section  is  to  construct  kernel  functions  that  dominate  the  truncated  kernel.  To  see  why  (5.1) 
implies  dominance  note  that  the  particular  choice  of  k*  guarantees  that  k*  has  the  same  variance  term 
B\/k*  as  the  truncated  kernel  when  evaluated  along  the  sequence  M^.  Once  the  kernel  k*  is  selected, 
its  MSE,  when  evaluated  under  its  own  optimal  M*  sequence,  can  be  no  worse  than  under  the  M^ 
sequence.  A  data-dependent  optimal  kernel  is  defined  as  k*  where 

(5.2)  k*  =  argmin         a(  T  <p(k(x)fdx]    +^^(2-^)      a,^,-,.  r. 

The  notation  o^^.,  -,  is  used  to  emphasize  that  the  A'M-matrix  used  to  construct  ct2a/  contains  diagonal 
elements  0(/c,  /ctj  ^^. )  depending  on  k  and  M^  is  the  optimal  bandwidth  for  the  truncated  kernel.  We 
estabhsh  the  following  result. 

Theorem  5.3.  Let  k*{x)  be  defined  as  in  (5.2).  Then  for  any  q'  6  [1,t],  r  <  oo, 

sup  (k*{x)  -  fc*(x))  =  Op{n-^^^  (logn)^/2+.') 

Let  Pn,M'{k'),k'  ^^  ^■'^^  kernel  weighted  GMM  estimator  with  iiernel  k*  and  let  /3^  M'(k')  k'  ^^  ^■''^  GMM 
estimator  based  on  k*.  Here,  M*{k)  =  argmin^M^n"^  —  loga.    r  where  for  AJ{k*)  we  use  a'l^j  with 
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^^Mh  ^^^^^  (t>{^* :  \/1oS'^im/i)  -"^  used  as  kernel  weight.  Then, 

Furthermore,  let  0  =  {Ai, ...,  Ap^,  Bi, ...,  Bqi^;pa,qb  finite]  be  the  set  of  all  reduced  form  models  that 
satisfy  Assumption  (B).  Let  Go  be  a  compact  set  Go  C  O  such  that  sup^^j  fi''?'  <  oo  and  infeo  -4  > 
0.  Then,  for  some  collection  of  sets  U^,  each  sufficiently  large,  there  exists  a  A:  G  IC^'  with  kg'   = 

lim^r^o  (l  -  ~k{x)^  I  |x|''  for  any  q'  6  [l,r],T  <  oo  such  that  siipe^  ^  (  (f^^^{k{x)fdxj  -  4  j  + 
^2,6(9')/^*  <  0. 

The  second  part  of  the  theorem  imphes  that  the  truncated  kernel  is  always  dominated  by  k* .  This 
is  the  case  because  the  truncated  kernel  {|.r|  <  1}  6  /C  and  there  is  an  element  in  /C  that  strictly 
dominates  it. 

6.  Bias  Correction 

Another  important  issue  is  whether  the  bias  term  can  be  corrected  for.  The  benefits  of  such  a  correction 
are  analyzed  first.  It  turns  out  that  correcting  for  the  bias  term  increases  the  optimal  rate  of  expansion 
for  the  bandwidth  parameter  and  conseciuently  accelerates  the  speed  of  convergence  to  the  asymptotic 
normal  limit  distribution. 

Using  the  result  in  Theorem  (5.1)  the  following  bias  corrected  estimator  is  proposed 

(6.1)  Pl,i  =  hnM  -  ~  [p'm^mPm)''  a   j  4>Hx)dx. 

Note  that  for  standard  GMM  (truncated  kernel)  the  bias  correction  term  is  simply  2-^  (  P'j^jQ'^JPm  )  A\ . 
The  bias  term  Ai  can  be  estimated  by  the  methods  described  in  the  previous  section.  The  quality  of 
the  estimator  Ai  determines  the  impact  of  the  correction  on  the  higher  order  convergence  rate  of  the 
estimator.  If  Ai  —  Ai  is  only  Op(l)  then  the  convergence  rate  of  /3„  j^^  is  essentially  the  same  as  the  one 
for  fin,M-  If  -Ai—  A\  =  Op{n~^)  for  r]  6  (0, 1/2]  then  the  convergence  rate  of  the  estimator  is  improved. 
The  mean  squared  error  of  the  bias  corrected  estimator  is  defined  as  before  by 


nD'/^£'Eb';,^„bl,/(D'/^  =  1  +  ^^(M.^.fc(.))  +  JR^, 


M 


where  ^/ri.  (/3„jv;  ~  /^j  =^n  M  +'n  m  ^^'i^h  the  same  restrictions  imposed  on  the  remainder  terms  /?^  ^^ 
and  r^jv^  as  in  (3.5).  We  obtain  the  following  result. 
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Theorem  6.1.  Suppose  Assumptions  (A)  and  (B)  hold  and  k{., .)  satisfies  Assumptions  (Ci)  or  (D). 
Then  for  any  i  eR"^  with  f  ^  =  1,  (fiUMJ,  K, .))  =  0{M/n)  -  logCTM-  The  optimal  M*  can  he  chosen 

by 

■    Mp      , 

M_  =  argmm logcTM- 

n 

IfM*  =  argmin  ^  -  log^M,  then  (m*JM;  -  l)  =  Op(n-i/2  (logn)''^^/^)  and 
n/y/W*  (^pIjCi,  -  K^i  +  ^D-'A[  I  hHx)dx^  =  Op(l). 
RemEtrk  10.   The  result  remains  valid  if  $^  j(^.  is  replaced  with  P^  M^/k')  y  J^  T^J)  as  long  as  q'  >  s' . 

•"  C 

It  follows  from  Theorem  (6.1)  that  for  f^n^M*  the  higher  order  MSE  is  0(logn/n)  compared  to 
0((logn)   /n)  for  the  GMM  estimator  without  bias  correction. 

7.  Monte  Cctrlo  Simulations 

A  small  Monte  Carlo  experiment  is  conducted  in  order  to  assess  the  performance  of  the  proposed 
moment  selection  and  bias  correction  methods.  For  the  simulations  we  consider  the  following  data 
generating  process 

(7.1)  yt,i    =    Pyt,2  +  ut-  Out-i 

yt,2    =    <Pyt-i,2  +  vt. 

with  [ui,  I't]  ~  N{0,  E)  where  S  has  elements  o"j  =  cr^  =  1  and  ai2-  The  parameter  P  is  the  parameter  to 
be  estimated  and  is  set  to  /3  =  1  in  all  simulations.  All  remaining  parameters  are  nuisance  parameters 
not  explicitly  estimated.  The  parameter  ai2  is  one  of  the  determinants  of  the  small  sample  bias  of 
both  Ordinary  Least  Squares  (OLS)  and  GMM  estimators  and  is  set  to  .5.  The  parameter  (p  controls 
the  quality  of  lagged  instruments  and  is  chosen  in  {.1,  .3,  .5}  .  Low  values  of  (p  imply  that  the  model  is 
poorly  identified.  The  parameter  9  finally  is  set  to  {  —  .9,  —.5, 0,  .5,  .9}  . 

We  generate  samples  of  size  n  =  {128,512}  from  Model  (7.1).  Starting  values  are  yo  =  0  and 
[uqiWo]  =  0.  In  each  sample  the  first  1,000  observations  are  discarded  to  eliminate  dependence  on 
initial  conditions. 

Standard  GMM  estimators  are  obtained  from  applying  Formula  (3.2)  with  Km  —  Im-  In  order 
to  estimate  ^m  we  first  construct  an  inefficient  but  consistent  estimate  /3„  ^  based  on  (3.2)  setting 
Km  =  fu  and  ^m  =  f-M-  We  then  construct  residuals  St  =  yu  —  Pn.iVit  and  estimate  Q.m  as  described 
in  (3.1).  Kernel  weighted  GMM  estimators  (KGMM)  are  based  on  the  same  inefficient  initial  estimate 
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such  that  the  estimate  for  (Im  is  identical  to  the  weight  matrix  used  for  the  standard  GMM  estimators. 
In  the  second  stage  we  again  apply  (3.2)  with  Cluf  and  the  matrix  K^j  based  on  the  optimal  data- 
dependent  kernel  k*  defined  in  Equation  (5.2)  with  /C,  =  /Ci. 

The  estimated  optimal  bandwidth  M*  is  computed  according  to  the  procedure  laid  out  in  Theorem 
(5.3).^  For  each  simulation  replication  we  obtain  a  consistent  first  stage  estimate  /3„  j  to  generate 
residuals  it-  We  estimate  9  by  fitting  an  MA(1)  model  to  it  using  the  Matlab  procedure  arimax.  We 
then  estimate  the  sample  autocovariances  F^^  for  j  =  0,  ...,n/2  where  n  is  the  sample  size  and  form  an 
estimate  of  Ai  based  on  Formula  (4.1).  Next  we  use  the  procedure  of  Definition  (4.2)  to  determine  the 
optimal  specification  of  the  approximating  VAR  for  yt  =  [yu,  y2t]'  allowing  for  a  maximum  of  2  *  [71^/^] 
of  lags.  Based  on  the  optimal  lag  length  specification  we  compute  the  impiilse  coefficients  of  the  VAR 
which  are  then  used  to  estimate  the  remaining  parameters  D  and  a^j  needed  for  the  criterion  MIC(M) 
as  well  as  for  optimal  kernel  selection. 

In  Tables  1-3  we  compare^  the  performance  of  feasible  kernel  weighted  GMM  with  optimally  chosen 
kernel,  KGMM-Opt,  to  feasible  standard  GMM,  GMM-Opt,  with  truncated  kernel.  In  addition  to 
automatic  selection  of  M*  we  consider  both  estimators,  KGMM-X  (with  optimally  chosen  kernel)  and 
GMJVI-A",  with  a  fixed  number  of  X  —  1,  20  lagged  instruments.  We  also  report  the  performance  of  the 
corresponding  bias  corrected  versions  BGMM-Opt  and  BKGMM-Opt  where  M*  is  selected  according 
to  the  procedure  described  in  Theorem  (6.1). 

In  order  to  separate  the  effects  of  selecting  M  from  the  properties  of  using  weights  for  the  moment 
conditions  we  first  consider  GMM  and  KGMM  with  a  fixed  number  of  instruments.  Tables  1-3  show 
that  for  (p  small  relative  to  the  sample  size  there  is  little  difference  between  the  two  estimators.  They 
are  also  not  verj'  different  from  OLS.  As  the  identification  of  the  model  improves,  KGMM  starts  to 
dominate  GMM  both  in  terms  of  (median)  bias  eis  well  as  MSE  and  mean  absolute  error  (MAE).  This 
effect  becomes  more  pronounced  as  more  and  more  instruments  are  being  used  which  can  be  explained 
by  the  predominance  of  bias  terms  in  this  case  and  the  bias  reducing  property  of  the  kernel  weighted 
estimator. 

Turning  now  to  the  fully  feasible  versions  we  see  that  the  same  results  remain  to  hold.  For  poorly 
identified  models  the  choice  of  M  does  not  affect  bias  that  much  and  all  the  estimators  considered 
have  roughly  the  same  bias  properties.  Especially  for  poorly  identified  parametrizations  optimal  GMM 
is  much  more  disperse  than  optimal  KGMM.  The  reason  for  this  lies  in  the  fact  that  optimal  GMM 
tends  to  be  based  on  fewer  instruments  which  results  in  somewhat  lower  bias  but  comes  at  the  cost 
of  increased  variability.  As  identification,  parameterized  by  (p,  and/or  sample  size  improve,  optimally 


The  Matlab  code  is  available  on  request. 
'Results  for  9  =  {  —  .9,  .9}  are  available  on  request. 
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chosen  kernel  weighted  GMM  starts  to  dominate  standard  GMM  for  most  parameter  combinations. 

The  bias  corrected  versions  of  both  estimators  attain  further  improvements  both  as  far  as  bias 
as  well  as  MSE  and  A4AE  are  concerned  when  the  model  is  well  identified.  In  these  circumstances 
the  kernel  weighted  and  bias  corrected  GMM  tends  to  have  a  somewhat  larger  MSE  than  the  non- 
weighted  version.  On  the  other  hand  the  non- weighted  version  tends  to  overcorrect  bias  in  some  cases. 
A  clear  ranking  is  thus  not  possible.  The  bias  corrected  estimators  tend  to  perform  relatively  poorly 
compared  to  GMM-Opt  and  KGMM-Opt  when  identification  is  weak  or  when  \6\  is  large.  Overall, 
their  performance  is  more  sensitive  to  the  underlying  data-generating  process. 

In  the  theoretical  development  of  the  paper  we  have  maintained  the  assumption  of  conditional 
homoskedasticity  of  the  innovations.  While  the  strongest  results  concerning  higher  order  adaptive- 
ness  and  optimality  of  bias  correction  in  Theorems  (4.4)  and  (6.1)  are  not  expected  to  go  through 
without  the  homoskedasticity  assumption  it  is  still  expected  that  the  optimal  M*  —  log  n/  (2  log  A) 
asymptotically.  In  this  sense  it  is  plausible  that  the  criterion  MIC(M)  performs  reasonably  well 
even  with  heteroskedastic  errors.     We  investigate  this  question  by  changing  the  first  equation  in 

1  /2 

Model  (7.1)  to  Hit  =  /3y2i  +  £t  -  ^^t-i  where  St  follows  the  IGARCH(1,1)  process  St  =  inht  with 
ht  =  ao  +  ai£^_i  +  ^i^t-i-  We  set  6i  =  .2,  uq  =  .1  and  ai  =  .8.  The  innovations  [u(,i;t]  are  defined 
as  before.  Since  heteroskedasticity  of  this  form  is  easy  to  detect  in  the  data  we  assume  that  GMM 
estimators  are  now  implemented  with  heteroskedasticity  consistent  covariance  matrix  estimators  ^m. 
For  simplicity  we  use  the  procedure  of  Newey  and  West  (1987)  with  a  fixed  number  of  lags. 

The  results  are  reported  in  Table  4  for  the  case  of  ^  =  .5.  Results  for  other  parametrizations 
are  available  on  request.  For  a  fixed  number  of  moment  conditions  KGMM  still  dominates  GMM  in 
many  cases  for  the  larger  sample  size  n  =  512.  The  estimator  GMM-Opt  continues  to  perform  well 
while  KGMM-Opt  now  does  worse  when  identification  is  weak  but  performs  at  least  as  well  when 
identification  is  strong  and/or  sample  sizes  are  large.  As  expected,  bias  correction  is  no  longer  effective 
in  reducing  bias.  Moreover,  when  combined  with  kernel  weighting,  it  produces  severe  outliers  resulting 
in  inflated  dispersion  measures.  For  this  reason  only  inter  decile  ranges  (IDR)  are  reported. 

8.  Conclusions 

We  have  analyzed  the  higher  order  asymptotic  properties  of  GMM  estimators  for  time  series  models. 
Using  expressions  for  the  asymptotic  Mean  Squared  Error  a  selection  rule  for  the  optimal  number  of 
lagged  instruments  is  derived.  It  is  shown  that  plugging  an  estimated  version  of  the  optimal  rule  into 
the  estimator  leads  to  a  fully  feasible  GMM  procedure. 

A  new  version  of  the  GMM  estimator  for  linear  time  series  models  is  proposed  where  the  moment 
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conditions  are  weighted  by  a  kernel  function.  It  is  shown  that  optimally  chosen  kernel  weights  of  the 
moment  restrictions  reduce  the  asymptotic  bias  and  MSE.  Correcting  the  estimator  for  the  highest 
order  bias  term  leads  to  an  overall  increase  in  the  optimal  rate  at  which  higher  order  terms  vanish 
asymptotically.  A  fully  automatic  procedure  to  chose  both  the  number  of  instruments  as  well  as  the 
optimal  kernel  is  proposed. 
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A.  Proofs 

Auxiliary  Lemmas  are  collected  in  Appendix  B  which  is  available  upon  request.  They  are  referred  to 
in  the  text  as  Lemma   (B.XX).  Before  stating  the  proofs  a  few  commonly  used  terms  are  defined. 

Definition  A.l.  Let  ^t^  =  Ext-  Define  wt,i  =  {xt+m  —  Mx)  {vt-i-i^i  —  jJ-y)  ,  T^^  =  Ewt^i  and  F^^  = 
Ew[i  and  let  wt,i  =  Wf^i  -  T^^.  Next  define  wl^^^  =  {yt-i  -  Hy)  {yt-j  -  My)'  with  Ew\-_^  =  TjV. 
Define  vt^i  =  et+miyt-i+i  —  i^y)  ^nd  Est+mys  =  ^l-s-  ^^^  ^^^  j,k-tli  element  of  Q,  Q~^  and  Q^  be 
^j,k,'&j,k  3.nd  'djf.  respectively.  For  a  matrix  A,  \\A\\    =  tr  AA'. 

Proof  of  Proposition  (3.2).  Consider  a  second  order  Taylor  approximation  of  D'^f   around  D~^ 
such  that  for  d-M  =  ^m  =  PM^M^Mn"^^^  Y17=r  ^t+mZtM^M, 

MPnM  -P)  =  D-'[I  -  {Dm  -  D)D-'  +  {Dm  -  D)D'\Dm  -  D)D-']dM  +  Op{M/V^) 

where  for  M/n^^'^  — *  0  the  error  term  is  Op{M / y/n)  by  the  Taylor  theorem,  and  the  fact  that  detl?  ^  0, 
Dm  -  D  =  Op{MI-n}/~)  as  shown  in  Lemmas  (B.14)-(B.23)  and  dM  =  Op(l)  by  Lemmas  (B.24)  to 
(B.33).  We  decompose  the  expansion  into  Dj^;  —  D  =  Hi  +  ...  +  H4  where  Hi  =  P'j^jKm^m  ^mPm  — 
P'n-'P,  H2  =  P'mKm^mKmPm  -  P'mKm^mKmPm,  H3  =  -P'^^Km^m{^m  -  ^mWmKmPm 
and  Hi  is  defined  in  (A. 14).  Also,  (Im  —  d^  -\-  di  -\-  ...  +  dg  with  dj  defined  in  (A.L5)-(A.24)  such  that 
Vn{Pn,M  ~  P)  =  bn,M  +  Op{M / y^)  with 

9  4        9 

7=0  i=l j=0 

The  terms  H3,  and  H4  contain  a  Taylor  series  expansion  of  Qjj  around  ^^j  given  by 


(A.l)  nil  =  n-}  -  o^/(Om  -  ^mWm  +B  +  op 


^M  -  ^M 


2 


) 


where  B  has  typical  element  k,l   given  by  vec(Qjvf  —  ^M)'ovecriavecn'  vec(r2;v/  —  ^m)-  The  term 

2 
Op{   CIm  —  ^M     )   =  Op(l)  by  Lemma  (B.9).     Let  gk{M)   =  M~'^  for  regular  kernels  k{j/M)  and 

gk{M)  =  M^X     for  rate-adaptive  kernels  k{j,  A/).  The  notation  gk  indicates  the  dependence  of  the  rate 

on  the  kernel  used.  Define  the  constant  Ck  =  1  for  regular  kernels  and  Cjt  =  —  log  A  for  rate  adaptive 

kernels.  In  Lemmas  (B.14)  to  (B.16)  it  is  shown  that  Hi  =  Hu  +  H12  +  Hi^  +  H14  is 

(A.2)  Hu  =  P'Mn-JPM-P'^~^P  =  0{M'^'X^'^') 

(A.3)  H12  =  P'M{I-KM)^l,j{I-KM)PM  =  0{gk{Mf) 

(A.4)  Hn  =  -P;;0^/(/-A-A;)PM  =  O(5fc(M)) 

(A.5)  Hi,  =  -Pl,{I-KM)n-,}PM=0{gk{M)) 
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where  =  means  'equal  by  definition'.  In  Lemmas  (B.17)  to  (B.20)  tiie  term  H2  =  H211  -\-  H212  +  H221  + 
H222  is  analyzed  to  be 

(A.6)  H2U  ^  -[Pm-Pm)' KMn-jKM{PM-PM)  =  Op{M/n) 

(A.7)  H212  =  P'MKMni}KM{PM-PM)  +  {PM-PM)'KM^''^KMpM  =  Op{n-^'^) 

(A.8)  H221  =  -{PM-PM)'KMni.}K'M{PM-PM)=Op{M/n) 

(A.9)  H222  =  P'MKMnilKM{PM-PM)  +  {PM-PMyKMnjlKMPM  =  Op{M/n'f^). 

where  Pm  is  defined  in  Section  3  and  P'j^j  =  [t^^ , ...,  TYj]  with  f^^  =  n"!  Er=maxO+i,r-m)+i  '^tJ- 
Lemmas  (B.21)  and  (B.22)  show  that  Hr^  =  H31  +  Hs2  +  ^33  +  H3A  is 

(A.IO)  //31      =      {pM-PM)'KM^li{^M-^M)^MKM{PM-PM)=Oj,{M/n) 

(A.ll)  H^2      =      -P'MKM^M{^M-^MWMKM{pM-PM)  =  Op{M/n) 

(A.12)  Hyi      =      -{pM-PMYKMn~J{ClM-nM)^MKMPM  =  Op{M/n) 

(A.13)  Hm    =     -P'j„KMn-^l{hM-nM)^-^KMPM  =  Op{n-"^) 

and  Hi  which  is  a  remainder  term  defined  as 

(A.14)  Hi  =  P'l^KMinjI  -  n-J  +  nilih^  -  ^uWm)KmPm  =  Op{M/n) 

where  the  last  equality  follows  from  Lemma  (B.23). 

Next  we  turn  to  the  analysis  of  d^;  which  is  decomposed  as  dfc  =  ^   dj.  Define 


-V2^,;^^,...,,-i/2^,; 


M 
t  t 


with  V  =  Voo  such  that  it  follows  from  Lemmas  (B.24)  to  (B.33)  that 

do   =  P'n-^v  =  0pii) 

di   =   Pi,njlVM-p'n-'v  =  Op{APx") 

P'm{I  -  KM)nj}{I  -  Km)Vm  =  Op{gk  {Mf) 

-P'j,,{I  -  Km)Qi}Vm  -  P'^njlil  -  KmWm  =  Op{gk  (M)) 

[Pm  -  Pm)'  RmQ-IKmYm  =  Op{M/n) 

{Pm  -  Pm)'  Km^IJKmVm  =  Op{M/v}'^) 

[Pm  -  Pm)  Km^m{^m  -  ^mWmKmVm  =  OpiM/n) 

P'mKm9.i}{^m  -  hM)^-^KMVM  =  Op(n-i/2) 

P'i^KmBKmVm  +  Op  {M/n)  =  Op{M/n) 

n-'/^Z^tpM^MnilKM  [1m  ®  (y  -  My)]  =  Op{M/7i^^-). 
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(A.15) 

do 

(A.16) 

di 

(A.17) 

do 

(A.18) 

dz 

(A.19) 

di 

(A.20) 

d5 

(A.21) 

dg 

(A.22) 

dj 

(A.23) 

ds 

(A.24) 

dg 

We  first  focus  on  regular  kernels  where  <7fc(M)  =  M~'.  We  consider  the  terms  in  the  expansion 
D~^  X)t=o  ^i  "  ^~^  St=i  £7=0  H^D~^dj  of  the  estimator  which  depend  on  M  and  n  and  are  largest 
in  probability.  Prom  the  results  in  Equations  (A. 2)  to  (A. 24)  it  follows  that  the  largest  such  terms  are 
^12;  -^^13,  -f^i4i  -f^222,  do,  d2,  ds  and  d^.  Of  those  terms  we  examine  cross  products  of  the  form  Edid'-, 
Edid'^D-'^Hi  and  EH^D-^dod'^D-'^Hj.  Letting 6^^^  =  c^'^k'^  hniM^oo  {Hn  +  Hu)  /gk{M),  the  largest 
terms  vanishing  at  rate  M~^  as  Af  — »■  00  are  Edod'^  =  —M~''kqB^  +  o{M~'^)  as  shown  in  Lemma 
(B.39)  and  -Edod'^D-'^iHis  +  Hu)  =  M-^/c^B^^^  +  o{M-i)  by  Lemmas  (B.24)  and  (B.42).  The  two 
terms  cancel  because  they  are  of  opposite  sign. 

Now  define  B^"^  =  k'^  limM-00  P'm{I-Km)^m{I-Km)Pm lgk{Mf.  Terms  of  order  M^^?  include 
£■^0^2  =  M-29fc2B^9)  +  o{M-''-i)  by  Lemma  (B.38)  and  -Edod'^D'^H'y^  =  -M'^'^k'^B^^^  +  o{M-'^i) 
by  Lemma  (B.35).  Since  Edod'2  and  —Ed^d'^D'^ H'12  are  of  opposite  sign  these  terms  cancel.  We  are 
left  with  E{dz  -  {Hn  +  Hu)D'^do){d3  -  {Hu  +  Hu)D~'^doy  =  0[M-^'i)  by  Lemmas  (B.16),  (B.24), 
(B.39)  and  (B.43). 

Terms  that  grow  with  M  and  are  largest  in  order  are  if222-D~^do  and  ^5.  It  follows  by  Lemma  (B.41) 
that  the  cross  product  term  i?i^222-C~^rfo'^5  is  of  lower  order.  We  ai'e  left  with  EH222D~^ dQd'QD~^ H'222  = 
0{n-^)  by  Lemma  (B.40)  and  Edr^d'c,  =  0{M'^/n)  by  Lemma  (B.44).  Then  (/;„  {MJ,  k{.))  =  0{M^/n)  + 
OiM^^"). 

Next  we  turn  to  the  case  of  the  rate-adaptive  kernel  where  gk{M)  =  NPX  .  Now  H\\,  H12,  H13, 
Hu,  H222:  do,  di,  d2,  ds  and  ds  are  largest  in  probability.  In  Lemmas  (B.34)  and  (B.36)  we  show 
that  Edod'QD~^Hii  =  Ed^d'^  such  that  these  terms  cancel  out.  Because  Edod[  =  Hu  +  o{n~^)  by 
Lemma  (B.36)  it  follows  that  Edod'^D^^  {H13  +  Hu)  is  of  lower  order.  The  largest  terms  remaining 
are  therefore  Edid[  =  M^'X'^'^^Bi  +  o  {gk{M)^)  where  Sj  =  liniA./^oo  -Hu/9k{M)'^  and  £(^3  -  {Hn  + 
Hu)D-'^do){d3  -  (//i3  +  Hu)D-'^do)'  =  0{M'^'\^'^')  by  Lemmas  (B.16),  (B.24),  (B.25),  (B.39)  and 
(B.43).  The  largest  term  growing  with  M  is  not  affected  by  the  kernel  choice  and  is  therefore  Ed^d'^  = 
0[M'^/n)  as  before. 

For  part  ii)  and  iii)  we  only  need  to  consider  the  terms  An  =  Ed^d'^  and  Bn  =  E{d3  —  {H13  + 
Hu)D-'^do){dz  -  {Hi3  +  Hu)D-'^do)'  +  Edid[.  Since  for  all  n  >  1  we  have  An  >  0  and  B„  >  0  it 
follows  that  lim  inf „  A^  >  0  and  lim  inf„  Bn  >  0  such  that  A,  S^''^  and  B\  are  nonnegative. 

From  Lemma  (B.44)  it  follows  that 

El'D-^'^d^d!^D-^/^l  =  mV"  (  1 4?{x)d:^   (.' D'^'-A^^AiD-^'-l  +  o{M^/n). 
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From  Lemma  (B.39)  it  follows  that  M^\^' Ed^d-i  =  -kqB[''^  +  o(l)  and  from  Lemma  (B.24)  it  follows 
that  Edod^  =  D  +  o{l)  such  that 

M^'X^^^E{Hy3  +  Hu)D-'dod'oD-\Hn  +  Hu)'  =  c^''fc2g(9)^-ig(9)'  +  ^^i^_ 

This  implies  that 

E{Hu  +  Hu)D-^dQd!^  -  E{Hn  +  Hii)D-Uod'^D-\Hn  +  H^i)'  =  o{Af~'X'") 

or  in  other  words  B„  =  Edsd'g  -  E{Hn  +  Hu)D-'^dod'oD-'^{Hn  +  H^)  +  o{M'^'X-^^).  Here  Ed^d'^  = 
^j2s;^2A/g(g)  j^  o(Af2^\2W)  ^  ^^^^^,^  -^^  Lemma  (B.43)  where  ^"^  is  defined  in  (B.19).  ■ 

Proof  of  Proposition  (3.4):  First  note  that  by  Lemma  (B.25) 

1  -  au-r     =    l'D''/^{D  -  P'm^-^Pm)D-^I'^1  =  i'D-^^^Edid[D-^/^£  +  o{max{n-\M^'X'^^'')) 
=    A'f'X^^'Bi  +  o(max(n-\  7\/2^a2^')). 

Since  D  —  P'j^,jQjj  Pm   >  0  by  standard  arguments  it  follows  that  1  —  a\M   |   0  as  M  — »  oo.  Also, 
QajPm  =  -^13  +  -^14  such  that 

bM^MbM      =      QMnMQ[„-{H,3  +  Hu){P'Mni}PMy^{Hi3  +  Hu)' 

where  the  last  line  follows  from  b[''^    =   c^'^k-^  \im{Hu  +  H 14)  /gk{M)   and  the  fact  that  b!^^   = 
Cq  '^k^~\imQMQi\[Q',^i/gk{M)~  by  similar  arguments  as  in  the  proof  of  Lemma  (B.43).  More  specifi- 
cally,  consider  for  example,  Qa/Pa/  =  Hn+Hu  with  Hn  =  A/'A'"  E'/.j,=i  ^"'  IJiI"  ]J^\^jft^^n,h^-\- 
We  use  the  chain  rule  of  differentiation  to  write 

,.      1  -  k(i,  M)      ,.     1  -  <p(k(t/M).APX'^^)  1  -  k{i/M)         1 
lim TT-  =  hm  - 


"m    \i\'^AP^X^'        M  l-k{i/M)  (|i|/Af)'    MU'PX^'' 

Because  d<t){v,z)/dv  =  2  —  z[—  log 2)^  +  2{z[—  logz)''  —  \)v  is  continuous  in  v  and  z  ]  0,  it  follows  that 
I  -  (P{k{i/M).ciAPX^\\  +  {om{1))) 


lim 


M 


=       hm       — — h  (fc  -  1) — 1-  log  A  + 


A/-00  \      M  '  '       M  *  M  J        {i/M}'' 

W    ,;„     /        logCl     ,    r,.       ,^-logA/    ,    ^     ,     -  log(l  +  Oa/(1))  ^  ^ 


=    A:,(-logA)«    lim       777^  +  {k  -  1)   ,  ^T         +  1  +  (1  +  oa/(1)) 

^  M^<x>\M\ogX  A/logA  A/logA  ' 

=     A;,(-logAr 
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Here,  om(1)  is  a  term  that  goes  to  zero  as  M  — >  oo.  Define  um  =  cr\M  —  (^2M-  It  follows  that 
(JM  =  1  +  0{M'^^>?^).  Let  X  =  1  —  aiM  +  cf2M  such  that  —  log(l  -  x)  =  x -\-  o{x).  Then,  -  logaM  = 
1  -  o-iM  +  0-2M  +  o{x)  =  M'^'X^^  {kqB^'^^  +Bi)  +  o(M2^A2^0-  Since  M*  ^  oo  as  n  ^  oo  it  follows  that 
iPn{M*,£,  fc(.))  =  MIC(M*)  (1  +  o(l)) .  The  result  then  follows  by  the  same  arguments  as  in  Hannan 
and  Deistler  (1988,  p.333).  If  in  addition  log  a im  =  ciM2^A2^^4-o(A^^)  with  A^  such  that  0  <  A^  <  A^/^ 
then  1  —  CTiM  =  ciM^^A^^  +  o(A^^)  and  the  same  holds  for  a2M  by  construction  of  k{., .).  Then,  it 
follows  that  (p^{M*,i,  k{.))  =  MIC(M*)  (l  +  0{-n}l'^  {\ognf^)\  by  the  same  arguments  as  in  the  proof 
of  Proposition  (4.3).  . 

Proof  of  Proposition  (4.1):  Since  xt  contains  elements  of  j/j  it  is  enough  to  show  without  loss 
of  generality  that  ^  C,jj^rrS''^j^-k  ^^  v^'^onsistent.  Let  /?  be  a  y^-consistent  first  stage  estimate.  The 
estimated  residuals  £t  =  {yt  —  y)  —  ^  (xj  —  x)  are  used  to  estimate  (j.  Let  g{X,0)  =  |i9(e^'^)|  with 
9{z)  =  l-9iz-...9rn-iz'^''^-  Define  the  parameter  space  01  c  M"*"^  such  that  ^  =  (Bi,  ...,em-i)  e  0i 
if  9{z)  7^  0  for  \z\  <  1.  By  Assumption  (B)  3  ©2  C  int  Gi,  02  compact  such  that  ^o  G  02- 

The  periodogram  of  st  is  InW  =  ''^^^  IZt  s  etEsC''^**"*^-  The  maximum  likehhood  estimator  for  9  is 
asymptotically  equivalent  to 

(A.25)  ^  =  argminAf,(6l) 

0 

with  Ai{9)  =  n-iE,^n(Aj)/5(Aj,0)  for  A^  =  2nj/n,  j  =  -n  +  1, ...,  0,  ...,n  -  L  Define  F^X)  = 
""'  Et,s  ^tese'^^'-'\  /r (A)  =  n-i  Zt,s  ^t{xs  -  l^,)e'^^'-'\  /^ (A)  =  n"!  (do  -  ao)  EtM  -  ^Je'^^*-^) , 
/^-(A)  =  n-i(do  -  ao)  Zt,s  ete'^^'-'^  and  /°(A)  =  n'^a^  -  ao)^  Et,.  e'^^*"^)  for  do  -  ao  =  y  -  My  " 
/3  (x  —  Mx)-  It  follows  that 

/^(A)    =    4(A) +  (^ -/?)'/;?  (A)  (^-^)  +  /;?(A) 

+2  (^  -  /3)'/r  (A)  +  2/r(A)  +  2(do  -  ao)(^  -  /?)'/,?" (A)- 

Note  that  /-(A,)  =  /^ (A,)  =  /^°(A,)  =  0  for  j  #  0  and  /-(A,)  =  n(do-ao)2,  /^"(A,)  =  (do-ao)  Et  £* 
for  j  =  0.  We  now  have 


A^„(0)  =  A-„(e)  +  2(/3  -  (3)'K^{9)  +  (/?  -  /3)'A^(^)(/3  -  /3) 


2(do  -  aQ)n   ^  ^(cf  +  (xt  -  Mx))  +  ("o  -  ^o)^ 


/5(O,0). 


From  standard  arguments  (see  Brockwell  and  Davis  1991,  ch.  10)  it  follows  that  K'^{9)  °^  A°''{9) 
with  A''''{9)  =  2n  j  f^b{X)/9iX,9)dX  and  d'^hf  {6)189  "4-  d^A''^{9)/d9  for  /t  <  oo  such  that  Ai{9)  -^ 
2n  f  fi.ir[X)/g{X,9)dX  uniformly  in  6*  6  02-  Consistency  of  9  follows  from  standard  arguments. 
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To  establish  y^-consistency  note  that  ^yn^A^^{9o)/^6  =  Op{l),  n   ^^'"^^t^t  —  ^p(l)  ^i^d 

Therefore 

(A.26)  V^dAi{9)/d9  =  V^dA'„{9)/de  +  ^iQi  -  (i)' d h.'^ [9) / d9  +  Op  (1) . 

We  also  define  dA'{9)/d9  =  2n  J  f,{X)dg~'^{X,  9)/d9d\  such  that 


(A.27) 


dA'[9) 


09 


< 


dA%9)       dAi{9) 


89 


89 


+ 


8Ai{9)       8Ai{9o) 


89 


89 


+ 


dAU0o 


89 


where  ||(9A^(6'o)/56'||  =  Op{n-^/-)  by  (A.26).  Definition  (A. 25)  for  9  implies  that 


dAi{9)       OAiieo) 


Finally 


89 


dAi{9)       dA'{d) 


89 


<  2 


dAi{9o) 


89 


=  0p(n-i/2) 


80 


89 


=    dA'„{9)/89-  I  27Tf,{X)dg-\\,9)/89dX 
+20-l3nd{A'-{9o)))/d9  +  Op{l) 


/here  the  second  term  is  Op{n   ^/^)  since  [p  —  j3)  =  Op{n    ^^^).  The  first  term  can  be  written  as 
8A'^{9)/d9  -  j 2TTf,{X)8g-\X,9)/d9d\ 

=  n-i  Y.  \^n{^3)  -  2^A(A,)1  dg-\X,:9)/de 

+  n-'Y.27TMXj)8g-\Xj,9)/d9-   i  2^f,{X)dg-'{X,9)/89dX 


where  the  second  term  is  0{n-^).  Now  define  e,^{e)  =  (27r)-i  /  ^^-^(A,  9)/0ee'^^dX  such  that  ^^"^(A,  9)/8e 
Ej^jWe-'^^and 

n-'Y.  !^n(^^)  -  2^A(^i)]  dg-\X,r0)/O9 


n  CO 


J     t,s=l  /=-oo 
n     n-|min(/,0)| 

=    n-'  Y        E       {et£t-l-Est€t-i)U0)  +  Op{n-') 

l——n  t=max(/,l) 


< 


Ei'i"'(""'E,(^*^'-'-^-^'^^-') 


MO 


/     n  \  1/2 
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where  the  second  equahty  follows  from  n   ^  ^  ■  e''^^^*   ^)  =  0  for  i  7^  s  and  the  inequality  follows  from 
the  Cauchy-Schwarz  inequality.  Then  note  that  n"^  Ylt  i^t  ~  ^^t)  =  Op{n~^^^), 


E 


l=  —  n  l=  —  n 

and  Yl?=-n  I'l   ^K^)'^  is  uniformly  converging  for  9  with  \9  —  6o[  <  5  for  some  5  >  0  such  that  6{z)  has  no 
zeros  on  or  inside  the  unit  circle.  Consistency  of  6  then  implies  X^JL-n  Kl   q(^)^  =  Op{l).  These  results 


establish  that 


dA%{9)  _  dA^0) 
09  00 


Op{n~'^l'^).  From  (A.27)  it  then  follows  that 


09 


Op(n-i/2) 


such  that  by  a  continuity  argument  y/n{9  —  9)  =  Op{\). 

To  show  consistency  of  Y^-  C,j+m^Y-k  ~  Jlj  ^j+rrJ'^-k  +  ^  J2j  Cj-i^nS'^-k  ^^  consider  without  loss 
of  generality  YljCj+m^]^k  since  xt  is  composed  of  elements  of  yt,yt-i,  ■■■,yt-r-  We  next  show  that 
y-n-l         7        fyy     __  yn-1         A        -pw     _Q  fn-l/SN    Write 


7)-l 


n-1 


n-1 


7  >      Cj+mrj_fc       2^Cj+mrj_fc—       2^      Cj  +  m  (^r^-fc       r^-fcj  Z^      (Cj+m       Cj+m  j  r^-fc- 


j=— n+1 

First  consider 


-^^^  E  ||c.-c, 


j  =  -7T+l 


rj!. 


j=-n+l 


r!-l 


<ni/2sup  c,-c,|  E  If,-/ 


where  P(supj    Cj  ~  Cy    >  C''^   ^    )  goes  to  zero  for  some  C  large  enough  by  the  previous  result. 
For  any  S  such  that  \9  —  0q\  <  S  implies  9{z)  has  no  zeros  on  or  inside  the  unit  circle  consider 


(„l/2^J- 


71+1 


J-k        ^ j-k 


>ri       < 


<  p  ui/2 


En-l 


(^)l 


^ j-k        ^ j-k 


>T) 


+p{J9-9o\  >S 


We  use  the  triangular  inequality    r^_^  —  r^_^ 


< 


^n-l 


nV2     sup     E     \J0WI 


ryy    _  ryj' 
1 j-k     ^ j-k 


-pyy    __  ryy 

'■  J-k         ^  J-k 


+ 


ryy    _  -pyy 


such  that 


Op{l) 


by  Equation  (B.9)  and  the  fact  that  supi^.g^ji^^  XI":=i„+i  |Cj  (^)|  =  0{1)  uniformly  in  n.  In  the  same 
way  it  follows  from  Equation  (B.IO)  that 


^r^-l 


ni/2     sup     E"     ,JC,(^)I^ 


S-fc      '■j-k 


0(1). 


This  establishes  that  Ai  —  Ai  =  Op{n    ^/^).  The  result  then  follows  by  Lemma  (B.47). 
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Proof  of  Proposition  (4.3):    Let  L^iM)  =  -log(j^,^^  +  ^A  (j'^^(l){x)'^dx)     and  L„(A/) 


—  log (Tm  +  —A  ij^   (p{x)^dx]    .  We  first  show  that  uniformly  in  M, 
(A.28)  L„(M)  =  Ln{M)  (l  +  Op{n-^/^  (logn)'/2+5' j 

with  s'  =  si  <  Ai  >  Ai  >  +  si  <  Ai  <  Ai  I .  Consider 


L„(M)  -  Ln{M) 


Ln{M) 


< 


^°g'^Aa-i°g'^ 


M 


log  0-2 


M 


+ 


(^-I) 


.4 


because  L„(Af)  >  M'^/nA  (j^^(p{xfdx')  and  cr^;  >  1.  By  Proposition  (4.1)  and  Lemma  (B.47)  it 
follows  that  A-A  =  Op{n-^^'^).  Next,  note  that  logcr|^  =  ciM'^'X-"'  +  o  (A/^^A^^)  and  loga^^^  = 
1).  Let  g{M)  =  \M\' X^^^^,  g'{M)  =  |Af  I'-'+^'Z^  a'^^I,  gr{M)  =  |A/|''-^  a'^^'  and 


M.h 


1  +  °p(^L,h 


gi){A4)  =  |Af  |'^~^  a[^'  .  First  show 

(A.2^(A//)-2Af-^'  (<7^Aa-^?A0     =    giMy'M-^'  \e'D7'^^H,^jDl'^^£  -  fD-'/^HuD-'^-'e 


=     Op(n-i/2  (iogn)i/2) 


n.h  =  Kuh^llh^Mh-h-  I*  >^  ^^°"gl^  *°  ^'^°'''  ^hat  ^n.A-^n  =  Op(n"i/2  (iog„)i/2^/(M)2) 


with  // 

First  we  analyze  individual  components  oi H^^  j^  —  Hu.  Note  that 


by  Lemma  (B.46)  such  that 


< 


=  Op(n-V2(logn)^/2j5r(j)) 


pyx 


=  0(.gr(j)).  To 


(1  +  Op{n~^/'^  (logn)^/^  j)  because 

analyze  terms  involving  -dj^j^+M  ^'^  use  the  expansion  Qy^—Q~^  =  Q~^  iO.  —  Qf^]  Q~^+Op  (n~^/^y/\ogn) 
such  that 


(A.30) 

^^:...H 

Note  that 

'^k.iA  -  ^k.z 

m— 1 
/=-m+l 

=     Op(n-i/2  |z  -  k\ gr{i  -  k)  [logn)'/'') 
by  Lemma  (B.46).  It  then  follows  that 

^n,n+M:h  -  ^n^n+AfW  =  Opin^'/^  (logn)^/^  M^^X'') 

where  S2  =  2.si  —  1  if  Aj  <  Ai,  S2  =  si  +  2.si  if  Xi  =  Aj  and  S2  =  si  if  Aj  >  Aj.  This  can  be  seen  by 
noting  that  X]w=i^ji.t  i^kih  ~ '^k.i)  '^i.j2+M  is  of  the  same  order  as  the  A/-th  autocorrelation  of  a 
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AR  process  with  lag  polynomial  (1  —  AiL)^'"*"^  ( 1  —  AiL  1     .  Consider  now  the  largest  order  element  of 
H,,r-  Hn,  namely  P'    -  ffi"^- 

11."  ^^'  •'       M,h  \     MJi 


"r 


^MA  ~  P'm  {^m  -  [^   ^]m)  ^^^-  ^^^^  largest  order  terms 


in  this  difference  are  of  the  form 


OO  / 

J1,..,J6  =  1 


Ji,J2,n 


h  "  '^h,J2  I    {'^J2J3+Ml^j3,j4'^J4  +  M,j5)'dji,je^^-j 


J6 


and  taking  into  account  (A. 30)  we  need  to  consider 


j:,...,J8=l 


Jl       JT.J2    ^      J2,]3,h  J2J3 


'^33  , J4  ('^J4  ,35  +  M'"35  ,36 ^J6  +  M ,h  ) ^  J7  .js  ^ 


yx 
-3a 


Op{n-^/^^/I^M''X^^) 


where  S3  =  4si  —  1  if  Ai  <  Ai,  S2  =  5si  +  4si  —  1  if  Aj  =  Ai  and  S2  =  5si  —  1  if  Ai  >  Ai  where 
S3  =  2s  +  s'.  The  remaining  terms  of  H^^  ^  —  Hu  are  of  smaller  order.  This  establishes  (A. 29). 

^2M  h  ~~  '-'^2M   /g'i^I)^  =  C)p(l).  We  focus  on  a  typical  term  in  cr^j^f^,  the 


Next,  show  that  \/nsup 
matrix 


Sm  =  P'M,h  yM  -  Km)  U~^ ^  (Im  -  Km)  PM,h 
with  population  analogue  Sm-  Other  terms  of  cr^.,f^  can  be  handled  in  a  similar  way.     Let  sm  = 
VlogCTiM,  5A/  =  J'^oga^j^jj^,  (t>j  =  (f){k{j/M),SM)  and  (/)j  =  <p{k{j/M),SM).  The  matrix  A'm  con- 


M 


tains  diagonal  elements  ^j.  Note  that  1  —  (j){v,  z)  =  (1  —  i;)(l  —  u  +  vz  (—  logz)')  such  that 

|1  -  kU/M)\  \SM  (-  logSM)^  -  SM  (-  log  sm)"!  .  Also  let  A,,,,  =  Tj^^,^- 'r^4  and  Af^^.^  =  t^' f^^Z^'mr':. 


< 

1 


-32,h 


Now, 


E  ((1  -  '^.0  (1  -  ^2)  -  (1  -  <p3.)  (1  -  <^.J)  ^'L 

=  E  {  (^.:  -  <^.0   {^2  -  <^..)  +  (<^..  -  ^.0  (1  -  <l>32)  +  (1  -  ^3^)  {<P32  -  h^)  }  ^j!,. 


3\,32 


31,32 


such  that 

Sm  -  Sm 


l9{Mf 


<     2  sup 

A/ 


SM       /-logs/ 


+2  sup 

M 

M 


9'{M) 


M 


SM 


g'{M) 


M       J        g'{M) 

logSA/V  SM 


log  SM 

M 


2    M 


M 


log  SM  \  '^ 

M      J 


3132 
M 


1  -  k{n/M) 


■E 

J1J2 


l-HkUi/M),SM) 


9'{M) 


9'{M) 
l-<P{k{n/M),SM 


1  -  kin/M) 


g'{M) 


E 

\32 

M 

E 

3\32 
aM       _    A 

31 ,32  ^Jl  J2 


1  -  k{j2/M) 


^31 ,32 


[Ji/Mf 


{J2/Mf 

l~4>ik{J2/M),SM) 


9{M) 


^3i  ,32 


lM 
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/here 


sup 

M 


SM       (  -  log  SM 


SM       (  -  log  SM 


g'{M)  V      M      )        g'{M)  \      M 

<  Ci  sup  I  (sM  -  Sm)  /g'{M)\  +C2SUP 


log  SM 

M 


log  SM 

M 


Op(7l-l/2 


(logn)i/2) 


here  ci  =  supyi,;  (—  log  [M^X    )  /M)    and  C2  =  2 sup  \criM/g'{M)\  ■  The  result  then  follows  because 


(log  (AFA^^)  /M)  is  uniformly  continuous  in'M  and  sup  |(sa./  -  sm)  /9'{M)\  =  Op{n~'^'^  (logn)^'  )  by 


(A. 29).  Also  note  that  J] 


py:c 
-J2,h 

=  Op{l)  with  probability  going  to  one.  By  the 


arguments  in  the  proof  of  Proposition  (3.4)  it  follows  that  [(l  -  (p{k{ji/M),  ^/iogaiM))  /g'{M)\  —>  0 
as  M  — >  oo.  By  the  same  arguments  as  in  the  proof  of  (A. 29)  it  can  be  shown  that 


M 


EiJii'iJ2 


J1J2 


-1 , 


pxy^M-'py^: 


"Jl       JlJ2 


n 


=  Op(n-i/2(iogn)V2) 


such  that  the  third  term  of  the  bound  for 


that 


5a/  —  5a/ 


5a/  -  5a/ 

.1/2 


lg'{Mf  is  Op(n-i/2(iogn)^/^).  It  thus  follows 


lg'{MY  =  Op{n~^'^  {\ogny''-)  uniformly  in  M.  We  have  therefore  established  (A.28). 


Let  Ln[M)  =  Ln{M)+g(M)^+logaM  v-'here g{M)^+]ogaM  =  0{\l'^^)  with  A^  <  A  by  Hannan  and 
Kavalieris  (1986,  p.47).  Since  \l^  /g{Mf  =  {K/Xf'  M~-'  ^  0  as  M  ^  oo  it  follows  that  L„(M)  = 
L„{M)  (1  +  0{gr{M)^)  with  gr{M)  =  AF  (A^/A)^^ .  Let  M*  minimize  L„(A/).  By  the  same  arguments 
as  in  Hannan  and  Deistler  (1988)  it  now  follows  that  M* / M*  -  1  =  Op(l)  and  Ln{M*)l L„{M*)  = 
1  +  Op(l).  This  establishes  the  first  part  of  the  Proposition. 

Moreover,  optimality  of  A/*  implies  that  —  logcrA/-  +  i  +logCTA/-  +  "'"^c  >  0  for  some  constant  c. 
This  leads  to 

log  n  +  2s  log  (A/*  +  1)  +  2  (A/*  +  1)  log  A  <  log  (2M*  +  1  +  o{g{M'f-)) 


or 


logn  ^  log  (2Ar  +  1  +o(g(Ar)2)  -  2s  log  (A/*  +  1)  -  2  log  A 

-    < ^ —_ 2  log  A  ^-  —2  log  A. 


M*    -  M* 

In  a  similar  way  we  note  that  —  logUM'  +  logcrA/--i  +  ^'     ~^c  <  0  such  that 

logn       log(2Ar-l  +  o(5(Ar)2)  -  2s  log  (A/* +  1) 


M* 


> 


A/* 


21ogA  ->  -2  log  A. 


Thus,  logn/A/*  =  0(1).  This  implies  that  M*  =  -  logn/(21og  A)  +  Op(logn).  Optimality  of  M* 
then  implies  that  Ln  \M*j  =  Op((logn)  /n)  and  logo-^^-^.  ^  =  Op((logn)  /n).  We  have  seen  before 
that  log (7^^.^  =  logcTyiY. (1  +  C)p(7i~-'/-^  (logn)^''^'''*  )  such  that  logaj^-j,   =  Op((logn)^ /n)  which  in 
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turn  implies  that  ^(M*)  =  A**M*^  =  Op(logn/ni/2).  Substituting  for  M*  =  -  logn/(21og  A)  +  e^. 
with  e^.  =  Op(logn)  in  X^' M*^  shows  that  A'^m*  =  Op(logn)  if  s  =  0  and  Op(l)  otherwise.  Since 
A,-/A  <  A^/^  by  assumption  it  follows  that  {K/Xf''"  =  Op{y/\ogn)  if  s  =  0  and  Op(l)  otherwise.  Then, 
consider 

v-logn/21ogA  (     log"' 


gr  (m*)  =  (A./A)^M-  (A,/A)- 


-2  log  A 


+  OT,[gr{M*)). 


Note  that  (A,/A)^M-  (logn)^  =  Op((logn)^^+^/2)  for  all  s  such  that  5,  (m*)'  =  Oj,  ((A,/A)    l°s"/l°g^  (logn)2^+^ 


where 


(AjA)-'°*^"/'°^^  =  ('(A^/A)-'°s"/'°=(^'-/^)) 


log  Ar/log  A-l 


^^-(logA./IogA-l)^ 


But  (log  A^/ log  A  -  1)  >  1/2  if  A^  <  A3/2.  Then  g,  (a/*)    =  Op(n-V2  [Xognf^)  and 

Z„(M*)  <  L„(M*)  =  Ln{M*)  (1  +  o(g,(M*)2))  =  L„(M*)  (l  +  Op(n-i/2  (iogn)i/2+^')) 

where  the  last  equality  follows  from  (A. 28)  and  the  fact  that  M*   =  Op(logn).     Also,  Ln(M*)   < 
Ln{M*)  =  L„(Af*)(l  +  Op(n'-i/2  (logn)^/^"^^'))  by  similar  arguments  as  before.  It  follows  that 

(A.31)  Ln{M*)/Ln{M*)  =  l  +  Op(n-i/2(iogn)i/2+^'). 

A  second  order  mean  value  expansion  of  L„  [M*]  around  M*  leads  to 

ia2Z„(M*)  .- 


L„  (m*)  =  L„  (m*) 


+ 


M*  -  Ar 


where 


M-M" 


< 


M*  -  M* 


d^  (Ln{M))  I  [dMf  .  Then, 


2       (9A/2 
and  we  have  used  the  fact  that  dl^  (A'P^  jdM  =  0.  Let  0{M)  = 


M* 
M* 


1=0. 


Ln{M*)  -  Ln{M*] 


Ln{M*)  2M*^~e{M*) 


14^  I  =0p(n-V2  (log n)V2+^') 


follows  from  (A.31),  Z„(Af*)  <  L„(A?')  and 

Ln{M*)l  (2Ar2^(Af  *))  =  Op(l). 
This  result  implies  that  M*  -  M*  =  Opin'^'^M*  (log 72)^/^+''/^)  =  Op(l).  Similarly,  we  note  that  for 


M*  maximizing  Ln{M)  we  have 


(A.32) 


AT 


-  1 


Ln{M*)  -  L„{M*) 


Ln{M*) 
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=  o{gr{M*f) 


since  L„(M*)  <  Ln{M*)  =  L„(M*)  [l  +  o{gr{M*)^j)  and  L„{M*)  <  Ln{M*)  =  L(Ar)  {l  +  o(gr{M*)^)) 
where  ^.(M*)^  =  0(7i-i/2  (logn)^/").  We  have  thus  shown  that  lil*  -  M*  =  Opin'-'^/'^M*  (logn)^^'^). 
We  use  this  result  to  sharpen  the  convergence  result  for  M*.  By  the  same  arguments  as  in  Hannan  and 
Deistler  (1988,  p.333-334)  optimality  of  M*  implies  that 

LniM")  <  Ln{M*)  <  L„(Ar  )(1  +  Op(n-i/2  (iogn)'/^+'')) 

or,  by  rewriting  the  inequalities  as  0  <  L„(A"/')  -  Ln{M*)  <  Ln{M*)Op{n-^/^  (logn)^/'^"^*'), 

(A.33)  0  <  1^  -  1  +  A _ L-  <  j^^i^  +  1  j  OAn-^^^  (logn)V^-^  ) 

where  by  a  mean  value  expansion 

[giM*f-g{Arf)n  _  Qg^j^rf/dM^  ( NT  _    \ 


with 


M*  -  M* 


< 


M*  -  M* 


Note  that  g{M*fn/M*'^  =  0(1)  by  optimality  of  A/*.  Further- 
more, dg{M)'^/dM  =  c/(i\/)2  (2s/M  +  21og  A) .  By  the  first  order  condition  for  M*  it  follows  that 
dg{M*f/dM  +  2M* /n  =  0,  or  n/ M*dg{l\rf /dM  =  -1/2.  From 

dg{M*Y/dM       \M'  j  [2s/ M*  +  2  log  A) 

since  M*  -  M*  =  Op(l)  by  previous  arguments  it  follows  that  n/M*dg{M*)'^/dM  -^  1/2.  We  now 
rewrite  (A.33)  as 

IF  +  '+         .7-         ") 

such  that  the  result  follows.  ■ 

Proof  of  Theorem  (4.4)  The  decomposition  v^  (h^^M-  -  K,M-)  =  Aw-  {^M-  "  ^m-)  ^}IJm-- 
D'j^j.{dj^j.  —  dM')  is  used.    Note  that  Dm-   =  Op(l)  and  di\f   =  Op(l).  The  following  calculations 


also  establish  D^-j,  =  Op(l)  and  dj^-j.  =  Op(l).  It  is  therefore  enough  to  show  that  %/n/M*{DM'  — 
D,;,.)  =  Op((logn)"^^'<(^'-'?'-i))  ^j^j  ,/^{/M~*{d^;.  ~dM-)  =  Opi{\ognr^^^'' -"•-''>).  Define  hf  = 
dia.g{(f){k{0).M'\'"), ...,  <p{k{{n  -  m  -  1)/Af ),  Af^A'^^))'  and  Km  =  {Jcm  ®  Ip)-  We  write  K^^,.  for  ma- 
trices with  elements  <P{k{{j)/M*),s^-j.)  where  s^-j.  is  defined  in  the  proof  of  Theorem  (4.3)  and  note 
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that  for  the  truncated  kernel  K^^^.  is  a  matrix  where  the  first  pM*  diagonal  elements  are  one  and  all 
the  other  elements  are  zero.  Let  Pn-m  =  '^"^-^'^n-m  and 


fi*A 


M 


Qm  0 

"         J-n-M—m 


,n 


M 


^M 


0 


•■n-M-Tn 


with  h\j  and  ^\j^  defined  in  the  same  way  replacing  fi^  and  Q,jJ  by  Q.m  and  ^n}-  Using  these 
definitions  we  can  rewrite  d-M  =  '^^^    P'n-m^ m^*m  ^MZ^-m^-  First  consider 


z: 


yf^lW*{d^,-dM')  =  V^W{K-mKM^n*7}k^.^^^-p'^_^kM'9rj^}kM'^ 


=      .MM~*{Pn-. 


^M-^*M>M.  +  ^M-n^->^,.  +  A,^.fi;r;^M. 


+kM'(p-*^}-^M')kM' 


n 


with  A^.  =  kj^,  —Km'  ■  From  Assumption  (E)  it  follows  that  for  some  constant  ci ,  k{j/M* )  —  k{j/M*] 
ci  br  [{l/M*y  -  (l/M*)")  .  Then, 

<P{k{j/Ar),Sj^.)  -  4>{kU/M*),SM') 

<  2C2  |5^.  (-logS^^.)'  -  SM'  (-logSM.)'l  +  |2  -  SM'  (-logSM-)"!  \KJ/M*)  -  k{j/M*) 

+  \SM'  {-logSM.y  -  1|  \k{j/M*f  -  k{j/M*f 

<  2c2  |s^^,  (-  logs,^.)'  -  SM'  (-  logSA/01  +  C3  br  l/M*"  ((AfVM 
for  some  constants  C2  and  C3  because 


*\2 


fcOYM*)^  -  k{j/M*) 


<2 


/c(j/M*)  -  A:(j7Ar)|  <  2ci  bf  1/M*'  ((a/VM*)'  -  1 
and  SM-  (—  log  SM-)'  ~*  0  for  A/^*  ~*  °o-  Now  note  that 

(5M  (- log  SM^  -  SA,  (- log  sm)'')  /  (g(M)M^+^')  I  |5(M*)M*^+^' 


<     sup 

M 


4-|s.>.  (-logS£,.)''-SA/.  (- log  SM' 


< 


By  the  proof  of  Proposition  (4.3)  it  follows  that  sup^  [hi  (-logSA/)'^  -  sm  (- logSA-/)'^)  /  U(M)M9+'''/^ 
is  Op{n'~^''^  (logn)  '   )  and  g{M*)  =  Op{n~^^^  logn)  such  that  the  first  term  in  the  previous  inequality 
is  Op{n-'^  (logn)^/^+''+'''''^)  =  Op{n-^/'^logn^/'^/M*).  For  the  second  term  we  write 


\^M'  {-'^OgSj^.j.y  -SM'  (-logSA/-)'| 


^mAzMIm^ 


SM'  (-logs A/* 


SM'  {-logSM-y 
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where  sm  =  M'X"  +  o  {APX'^'^)  .  Then  (-  logs^-^.)'  /  (-  logSM')'  -  1  =  Op(n-i/2  (logn)^/2+//2-,  ^^^ 
Sm'/sm'  -  1  =  (m7M*)'  a(^^'-^^*)  -  1  =  Op(Ti-i/2  (logn)^/2+^''/^)  by  Theorem  (4.3)  and  the  delta 
method.  Since  .s a/.  (- log sm-)'  =  ^(tz'^/^  (log 71)''"'""^)  it  follows  that  |s^j.  (- logs^;^.)'' -  .sa;-  (-logSA/.)'|  = 
Opin-^l"^  \ogn^'"^/M*).  From  Theorem  (4.3)  we  also  have  l/Af*"  {(m*IM*X  -  l)  =  Op[n-^''^  (logn)^^^^''  /M*") 
such  that 


4>{k{3/hnrsM.)'<t>{H3lM*),SM.)    =Op(|j|7T-V^(logn) 


a/2+max(5'-i7,-l) 


)• 


>  1 )  =  0  with  probability 


For  the  truncated  kernel  A^-^-  =  0  unless  M'  f-  M* .  But  P  (  M*  -  M* 

tending  to  one.  So  the  rest  of  the  proof  is  trivial  for  the  truncated  kernel  and  we  only  consider  the  more 

general  case.  Denote  the  /c,j-th  element  of  fi'^^.^  by  'd*j^^,j^-  Then,  letting  fc(ji,  A/*)  =  (f){k{j/M*),Sp^-j.) 

y;VA^p;_A^.r2;ri/?A,.^j^  =  ^Rm*^Y.  E  fjf  [fc(ji,Ar)-fc(ji.Ar)]^;'^->(j2,Ar)t;,,, 

'=iii.j2=i 
=     sJnlKPC2  [fif  +  4  +  c^e"  +  ^f  +  rf^  +  4  +  '^'^] 

where  d^  =  ^  ^^=1  E "Xi  ^7.   [Mji,  A/*)  -  Mji,  M')]  ^;fi/c(j2/Ar)7;,,„ 


n        n—rtj 


<  =  ^Y.  E  (rjf-fjf)[A:(j:,Ar)-/c(ji,Ar)]^;fJ;fc(j2,Ar)t.,,,, 


(=1  Jl.J2  =  l 


and  similarly  for  d^, ...,  dg  corresponding  to  Definitions  (A.20-A.24)  for  cZ.5 dg  where  we  replace  Km 

by  A.-,,  and  Qa/  by  $1]^^  in  the  same  waj'  as  in  d,f.  We  consider  the  largest  term  ^5 


^n/hPd^ 


<C, 


1 


n       n  —  m 


i^.OJn 


^/.1/2-P 


-1/2 


(log  n) 


1  /2+max(s'  —q.  —  l) 


t=l  JlJ2  =  l 

By  the  same  arguments  as  in  the  proof  of  Lemma  (B.29)  it  follows  that 


'E  E  i^ir||(f;r-r;r)^;5>(^2,Ar)., 


J2 


(A.34) 


n  —  m 

E  l^ili(flf -rjf)n'J>02,Ar)X:;^^t'u.||  =  Op(Ar) 


Jl-J2  =  l 


where  we  have  used  that  ^  |ji|''  ^^1  y^  <  oc  uniformly  in  j2  since  i^l^-j  has  the  same  summability 
properties  as  ^j^.j^-  Also  note  that  k{J2,M*)  =  0  for  |J2|  >  A/*.  The  bound  (A.34)  implies  that 
y/n/M*d^  =  Op{n~^/^  (logn)  +"'^''(*  ~i~  •').  Using  similar  arguments  based  on  the  proofs  of  Lemmas 
(B.28,B-30-B.33)  it  can  be  shown  that  the  remaining  terms  df,d^...,d^  are  of  smaller  order.  For  d'^ 


note  that  E"  ,_ ,  Ijil"    F'^t? 


jl  J2 


0(1)  such  that 


jlJ2  =  l 


yxy  „*M- 

]\       n  J2 


/       II  n  2\V2 
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and  thus  ./f^jM^d^  =  Op((logn)'"'*''(^'-'''-'') 


For  vWM^^n-mAM.^;^>M-%^  =  ^AVM^C2  [<^  +  ...  +  d^^  +  d^^]  we  define 


n       n—m 


d^^  = 


JlJ2 


^*)    ^n,2     Hj2,Mn-k{J2,M*] 


Vt,i; 


n 


£^iiEr=i^M2/x^ii' 


1/2 


and  similarly  for  the  other  terms.    From  YT]^^=\  \\h\'\hV 

0{\)  it  follows  that  ./WJWd^^  =  Oj,{n-^'^  (logn)i+2'"^''(^'-'?'-i)  M*-i/2)  =  Op(l).  For  ^f^^^JWd^^ 

note  that  for  some  Ci 

max(Air',M*) 

114^11  <  CrO,[n-'  (logn)^+2--(^'-'--^))        J]        l^'il"  l^^l'  ||  [^T.  '  ^^')  ^tn  E"=i  "^^^nl^ 

jlj2  =  l 

such  that  for  any  finite  e  >  0  and  some  C  consider 


\M'+t] 


■32 


>C     +P{M*  >  M*  +  e). 


where  \M*  +  e]  denotes  the  smallest  integer  larger  than  M*  +  e.  Using  the  Markov  inequality  it  follows 
that 


jl  J2  =  0 


1/2 


a*M' 
ji  J2 


(^||E"=i^*j=/^ 


1/2 


=  0{n-'/^) 


by  similar  arguments  as  in  the  proof  of  Lemma  (B.29).  Therefore  y/n/M*d^'^  =  Op{n   ^  (logn)-^"*"  ™'"'^^    ''' 
The  remaining  terms  are  of  smaller  order  by  the  same  arguments  as  before. 

Finally,  for  ^/^^JM^P^.^Km-  {p*j^}  -  ^IJ.^)  A'm-^^  we  expand  n*r}  around  Q*'}  and  n*-} 
around  ^*j^]}  as  in  (A.l)  leading  to 


^;>    -  ^M'    =  ^mH^Ii'  -  ^*M,)%7'    +  Op 


M 


^A/.  ~  ^W 


(A.35) 
and 

(A.36)  hi',}  =  n*^j}  -  nij}{nij,  -  ni,. )nij}  +  o^i 

Note  that  ft*- ,  -  Q^j.  =  0  if  M*  ^  M*.  Because  M*  and  M*  are  integer  valued  by  definition  it  also 


^M'  "  ^M' 


follows  that  M*  +  M* 
M*  -  M* 


M*  -  M' 


>  1.  We  thus  note  that  for  any  e  >  0, 


^}j.  -  ^\i' 


>  e  ^ 


>  1.  Using  the  fact  that  P( 

pU'^ 


M*  -  M* 


>  1)  tends  to  zero  it  follows  that 


^*M'  ~  ^M- 


>s)<P{ 
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M*  -  M* 


>1)^0, 


J 


in  fact    n*-  ^  —  n^^.    converges  to  zero  in  probability  at  arbitrarily  fast  rates.  Also  note  that    0^,|.  —  (l 


'A/- 


Op{n-'^'^{\ogny''^)  by  Lemma  (B.9).  Combining  (A. 35)  and  (A. 36)  then  leads  to 


^*^}  -  ^11'  =  OM-{nij.  -  hl^.)OM.  +  Op( 


■j^,  --M-  -  "•■     \--Jw  ■  -   M' 

-)*— 1       o*  —  ^  let*  o*     ^n*— 1 


'  *  A  /  •    ~   '  *  /I 


'A/ 


A/* 


with  Om-  =  n;7.'  -  0*7/ (Q^,;.  -  Q^;.)^M-  ■  I^'  t'^us  follows  that 
Next  consider 

First,  we  analyze  P;_„A^,.  j^^;;A';,.F„_^  =  H^+H^  +  H^  where  //.-^  =  //2I1 +  ^.n2  +  ^2li +  ^222> 
H^  =  H^^  +  //g^  -\-  H^^  +  H^^  and  the  definitions  follow  from  the  definitions  in  (A.6)-(A.13)  with 
the  appropriate  substitutions  for  ^*^}  and  Aj^-^..  Furthermore  let  H^  =  '^^^QY7i=o^^^i^i'^/^P)  ~ 


^A/.f^;-;>A>  +  K^rK^!^^>  +  ^m-^I^^^m- 


Pn- 


,/^^*\\H^\\     <     y;i7AFOp(n-i/2(logn)^/2+max(.'~,,-i))    ^    |jil||r^f^*f;5;fc02/M*)r 
=    Op(M*'"^^(^'-'''-^))0(l). 


J2 


Now  consider  ^^^222 


^ 71  —  771 

Jl-J2=l 

and  by  the  proof  of  Lemma  (B.20)  it  follows  that  E  T,]~J!^=i  \ji  I"  (f  Jf  -  T^^)  ^;fj>(i2/A^*)f  r^,  = 
0{n-^/'^M*)  such  that  y^n/M*  \\H^22\\  =  Op(n-i/2  (logn)^^"'^''*''"''-"^^).  Using  the  results  of  Lemma 
(B.22)  we  can  show  in  the  same  way  that  y/n/M*  \\H^  ||  =  Op(l).  All  the  remaining  terms  are  of  lower 
order  by  Lemmas  (B.17-B.21). 

Next,  we  turn  to  P;,_„A^^.fi*r;A^^.P„-77z  =  H.^^  +  Hi'^  +  H^^  where  Hi-^,H^^,H^^  are 
defined  in  the  obvious  way.  It  follows  immediately  that 


y;VM^I|//^^||     <     y;V^Op(n-i(logn)i+2'"^''('^'-''--i))    ^    \n\\j2\ 

JlJ2  =  l 

=     Op(n-i/2  (logn)-^/2+2max(.'-9,-i)^_ 


pxy  n»A/"pyi 
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For  H^^^  we  note  that  E  E",X=i  \h  I"  1721'  '^T.-^tn^^-n  =  ^(1)  such  that  again  ^^/M^  WH^^^  \\  = 
Op(l).  The  same  type  of  arguments  also  estabhsh  ^JnjM*  ^Hy^^  =  Op(l).  All  the  other  terms  are  of 
lower  order. 

Finally,  we  turn  to  ^JnlM*P'^_^KM*  (^t-7.  ^  ^m*^  )  ^M'Pn-m  =  Op(l)  by  the  same  arguments 
based  on  (^^4,  —  ^t;,.)  as  before.    ■ 

Proof  of  Proposition  (5.1):  We  consider  Edi  and  EHiDdj.  First,  Edi  =  0  for  i  <  3.  The 
terms  d^.d^,  ...,dc)  are  of  lower  order  by  Lemmas  (B.28,B.30-B.33).  The  terms  EHzD~^dj  are  all  of 
lower  order.  The  largest  order  term  is  therefore  Ed^.  By  the  proof  of  Lemma  (B.44)  it  follows  that 
Eds  =  M/VnAiJ(t)'^{x)dx  +  o{M/y/n).  m 

Proof  of  Theorem  (5.3):  By  Proposition  (4.1)  it  follows  that  Ai  -  Ai  =  Op{n~'^/'^).  By  the 
same  arguments  as  in  the  proof  of  Lemma  (B.47),  sup;^^  '^2Mh  ~  "^^M  =  Op{n~^^'^).  Note  that  (T2m 
is  uniformly  continuous  in  /c  G  /C^  for  each  j  —  0,  1,...,t.  It  is  not  uniformly  continuous  in  fc  6  /Cg, 
however.  Note  that  ijj  ^UL  =>  ijjj  ^  ^  such  that  for  k{x)  €  K.j  it  follows  that  kj  ^  0  and  fc^  =  0  for  all 
i  <  j.  We  therefore  analyze  the  problem  of  finding  y*  for  j  fixed,  where  A:-'*  =  argmin(5n,j(^)  and 


Q^^j{k)  =  A(j      (}>{k[x)fdx 


Also  define  and  Qj{k)  =  A  f/^  (t){xfdx\    +k]B^^yK.  Then,  uniformly  for  k  e  /C^  \ima2M^n/Mf  = 
k'^B'-i'i/K*  such  that  (7^  - .  -n/Mf  -  k'^B'^^^K*  =  Opin''^/'^  (logn)^/^+'').  Next,  note  that 

-^  ^  l\'lr-p  ,n.  -' 


log  n      log  n       \  M^         j  log  n 

where  M*/  logn  =  (-2  log  A)"  Vo(l).We  have  shown  that  sup;,g^,  \Qn,j{k)  -  Qj{k)\  =  OpipT^I'^  (logn)^/^+^') 
because  Qj  is  uniformly  continuous  in  k  for  each  j.  It  then  follows  from  standard  arguments  that 
sup^  y*  -  k'i*  =  OpiyTT^I'^  (logn)^''^+*').  To  find  the  optimum  in  /C,  we  now  define  k*  =  U*  with  q  = 
argmaxq>,'  Qn.qik'^*)-  But  now,  q  is  countable  and  finite  and  for  each  g,  Qn,q{k''*)  converges  to  a  fixed 
value.  Thus,  q  converges.  For  the  second  part  note  that  M*  (k*j  /M*  (fc*)-l  =  Op  (n"^/^  (logn)^/^"^*' j 
because  it  can  be  checked  easily  that  the  proof  of  Proposition  (4.3)  still  goes  through  when  k  is  replaced 
with  A:*.  Then,  if  g'  =  1, 

<P  (k[i/M*  (r)),  ^\oga,^j.^^.y^  -  <J>  (k*{ilM*  (r)),  yiogaj,^.( 

=     (V'l  -  V'l)  i/M*  {k*)  +  Op  (l/M*  (fc*' 
=     Op(n-i/2(logn)i/2+ma.x(.'-g.-i)y 
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When  (f  >  1  the  error  is  Op{n~^^~  (logn)  '  +™^^(«  -9,-  J-j  g^(,|-^  ^j^g^^  ^j^g  same  arguments  as  in  the  proof 
of  Theorem  (4.4)  go  through.  For  the  last  part  of  the  theorem  first  consider  (/  =  1  and  let  k{x)  =  l  —  x^ 
such  that  f  h?{x)dx  =  64/45  <  2  with  A:2  =  1-  Now  consider  a  perturbation  to  A;(.),  say  ^^(x)  =  1  —  ex  — 
x^  such  that  fci  =  e.  In  other  words,  for  k^  the  approximate  MSE  is  A  (  f^  /j.£(x)^dx ]  -\-e^B'^^^ /k*  + 
B\/k*.  We  need  to  show  that  by  choosing  e  this  can  be  made  smaller  than  the  approximate  MSE  for 
the  truncated  kernel,  which  is  iA  +  Bi/k*.  Note  that  J  h'^{x)dx  =  J  li^{x)dx  +  5  {e)  where  5  {e)  = 
-e  +  8/2l£2  +  4/3e2  +  2/5£l  Choose  e  such  that  (e^  supe^  B^^V^l^^ )  +  ^{s))  /-^  <  4  -  (64/45)^  which 
is  possible  because  of  the  properties  of  ©o-  This  shows  that  k^  dominates  the  truncated  kernel  when 
M^  is  used  as  a  bandwidth  choice.  But  then  clearly,  if  M  is  chosen  optimally  for  k^  it  can  not  do  worse 
than  with  M^.  Similar  arguments  can  be  used  to  handle  the  case  where  g'  >  1.  ■ 

Proof  of  Theorem  (6.1)  We  consider  the  expansion  of  ,/n  {Pn.M  "  0)^^  before.  The  analysis  of 
the  MSE  of  \/n  (,8^  m  ~  0)  '^  then  the  same  as  the  analysis  for  ^/n  {13^^^^  —  /?)  where  we  replace  d^  by 

db  =  d^ 7=-4i   /  <f)  {x)dx 

Vn       J 


and  the  additional  term  dis  =  M/^{A[  -  A\)  j (fp'{x)dx  =  Op{M/n),  where  the  order  of  magnitude 
follows  from  Proposition  (4.1),  needs  to  be  considered.  First  note  that  Ed^  —  -^A'j  j  (}P'{x)dx  =  o(l). 
Then  Edc^d'r,  =  E{d5-  Edc,){dz-  Ed^)'  +  o{\).  From  the  proof  of  Lemma  (B.44)  it  follows  that  Ed^d'^  = 
0{M/n).  Also  E('H222D-'^dod'r,D-'^/'^£  =  o{M/n)  by  Lemma  (B.41)  and  du  =  Op{M/n)  together 
with  Lemma  (B.40)  shows  that  all  remaining  terms  are  at  most  of  order  M/n.  By  Remark  (4),  to 
first  order,  M*  does  not  depend  on  the  constants  of  the  M/n  terms.  Thus  it  is  possible  to  minimize 
Mp/n-logCTM-  Then,  M*/M*  - 1  =  Opin''^''^  (logn)^/^+^')  follows  by  Proposition  (4.3).  Finally,  note 
that  ■jM*(M;/M*-l\b~}^Aij(l)'^{x)dx  =  Op{n-^^^  {\ogny^^-^''  ^/M;)  =  Op{l)  by  Proposition 
(4.3)  and  Lemma  (B.47).  In  light  of  Theorem  (4.4)  this  establishes  the  last  part  of  the  Theorem.  ■ 
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Table  1:  Performance  of  GMM  Estimators  with  0  = 


n  =  128 

n  =  512 

Estimator 

4> 

Bias 

I\LA.E 

MSB 

Bias          MAE 

MSE 

OLS 

0.1 

0.51652 

0.51672 

0.52458 

0.5174     0.52002 

0.52203 

GMM-1 

0.54524 

1.3457 

2.5025 

0.5095        1.0594 

1.653 

GMM-20 

0.4481 

0.45146 

0.47529 

0.43876     0.44586 

0.4726 

KGMM-20 

0.44917 

0,47111 

0.53162 

0.44736     0.46461 

0.52371 

GMM-Opt 

0.48439 

1  0699 

2.1788 

0.47735       0.8655 

1.4054 

BGMM-Opt 

0.39075 

1.0655 

3.6218 

0.33739     0.69265 

0.99113 

KGMM-Opt 

0.48483 

0.75581 

1.1466 

0.46242     0.66349 

0.92119 

BKGMM-Opt 

0.4259 

0.85469 

1.3366 

0.38961      0.77809 

1.0823 

OLS 

0.3 

0.52227 

0.52425 

0,53233 

0.52319     0.52238 

0.52433 

GMM-1 

0.37319 

0.92571 

1.7589 

0.19208     0.64946 

2.8861 

GMM-20 

0.4699 

0.47633 

0.49813 

0.4479     0,45124 

0.47233 

KGMM-20 

0.45252 

0.46546 

0.51769 

0.40358     0.41276 

0.4601 

GMM-Opt 

0.41832 

0.8571 

1.6191 

0.2193     0.60238 

2.8213 

BGMM-Opt 

0.1965 

0.84522 

1.6398 

-0.01157     0.48039 

0.64576 

KGMM-Opt 

0.40261 

0.65701 

0.93193 

0.26319     0.46613 

0.68559 

BKGMM-Opt 

0.26391 

0.71097 

1.0397 

0.063104     0.50435 

0.80424 

OLS 

0.5 

0.46675 

0.46775 

0.47616 

0.47121      0,46856 

0.47072 

GMM-1 

0.064923 

0.36801 

0,63065 

0,012124     0.17127 

0.22942 

GMM-20 

0.4193 

0.42041 

0,44244 

0.2859     0,28971 

0.31147 

KGMM-20 

0.33826 

0.35308 

0,40205 

0.18586     0.19964 

0.23394 

GMM-Opt 

0.076623 

0.36358 

0.6227 

0.01198     0.17069 

0.2287 

BGMM-Opt 

-0.20138 

0.52444 

0.70699 

-0.17292     0.27818 

0.37291 

KGMM-Opt 

0.085374 

0.32911 

0.52499 

0.013643     0.16615 

0.21996 

BKGMM-Opt 

0.021164 

0.36326 

0.55213 

-0.02036     0.17825 

0,24791 
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Table  2:  Performance  of  GMM  Estimators  with  6  =  0 


n  =  128 

n  =  512 

Estimator 

0 

Bias 

MAE 

MSE 

Bias 

MAE 

MSE 

OLS 

0.1 

0.49705 

0.49725 

0.50385 

0.49483 

0.49372 

0.49516 

GMM-1 

0.49461 

0.98011 

1.8786 

0.48621 

1.1214 

5.2146 

GMM-20 

0.50373 

0.50334 

0.52432 

0.49198 

0.49328 

0.51224 

KGMM-20 

0.49943 

0.51901 

0.58448 

0.48642 

0.49693 

0.55721 

GMM-Opt 

0.51603 

0.82588 

1.5626 

0.48211 

0.95272 

5.0932 

BGMM-Opt 

0.48989 

1.1095 

2.8016 

0.45823 

1.326 

12.9356 

KGMM-Opt 

0.50207 

0.69179 

1.1779 

0.48339 

0.66513 

0.89137 

BKGMM-Opt 

0.52045 

0.76303 

1.1456 

0.46361 

0.74174 

1.0006 

OLS 

0.3 

0.45384 

0.45547 

0.4613 

0.45384 

0.45403 

0.45568 

GMM-1 

0.28972 

0.66679 

1.2017 

0.1072 

0.47244 

0.90049 

GMM-20 

0.44322 

0.44483 

0.46525 

0.40133 

0.40931 

0.4302 

KGMM-20 

0.43339 

0.4501 

0.51347 

0.34142 

0.35464 

0.40636 

GMM-Opt 

0.36738 

0.622.56 

1.1651 

0.2135 

0.48551 

0.87465 

BGMM-Opt 

0.28444 

0.66182 

1.101 

0.11745 

0.35947 

0.49946 

KGMM-Opt 

0.36569 

0.52213 

0.7151 

0.20702 

0.39769 

0.57164 

BKGMM-Opt 

0.28996 

0.57883 

0.84856 

0.11337 

0.40762 

0.66559 

OLS 

0.5 

0.38046 

0.37889 

0.38487 

0.37494 

0.37593 

0.3775 

GMM-1 

0.042111 

0.29423 

0.54636 

0.015656 

0.13132 

0.1768 

GMM-20 

0.32936 

0.32894 

0.35015 

0.21703 

0.21913 

0.23704 

KGMM-20 

0.2695 

0.28151 

0.32307 

0.15066 

0.1643 

0.19171 

GMM-Opt 

0.08868 

0.31714 

0.58005 

0.030298 

0.14035 

0.18463 

BGMM-Opt 

0.04296 

0.27811 

0.41247 

0.016908 

0.13559 

0.18328 

KGMM-Opt 

0.090384 

0.28488 

0.44025 

0.02436 

0.13273 

0.17765 

BKGMM-Opt 

0.054996 

0.28023 

0.42958 

0.015666 

0.13461 

0.18303 
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Table  3:  Performance  of  GMM  Estimators  with  9  =  .5 


n  =  128 

77  =  512 

Estimator 

0 

Bias 

MAE 

MSE 

Bias 

MAE 

MSE 

OLS 

0.1 

0.4708 

0.47223 

0.4798 

0.46821 

0.47019 

0.47208 

GMM-1 

0.52711 

1.1213 

2.0323 

0.52264 

1.1775 

2.20,54 

GMM-20 

0.39267 

0.39842 

0.42841 

0.36723 

0.37541 

0.40458 

KGMM-20 

0.40082 

0.42934 

0.4976 

0.37235 

0.40024 

0.45747 

GMM- Opt 

0.41931 

0.80943 

1.357 

0.41969 

0.86022 

1.5581 

BGMM-Opt 

0.32466 

1.2808 

13.0744 

0.30929 

0.60111 

0.88186 

KGMM-Opt 

0.43155 

0.67594 

0.9787 

0.40402 

0.67063 

0.97716 

BKGMM-Opt 

0.35219 

0.70706 

0.98258 

0.32216 

0.76215 

1.232 

OLS 

0.3 

0.38604 

0.38641 

0.3945 

0.38685 

0.38754 

0.38952 

GMM-1 

0.19769 

0.82882 

1.6984 

0.087729 

0.496 

0.87353 

GMM-20 

0.27067 

0.28668 

0.31713 

0.24542 

0.25758 

0.28589 

KGMM-20 

0.26555 

0.30311 

0.36346 

0.22235 

0.25503 

0.30632 

GMM-Opt 

0.23217 

0.62133 

1.0808 

0.13734 

0.41268 

0.70857 

BGMM-Opt 

0.098508 

0.57105 

1.1184 

0.018366 

0.29295 

0.46272 

KGMM-Opt 

0.24356 

0.50607 

0.742 

0.16227 

0.35728 

0.54694 

BKGMM-Opt 

0.13393 

0.53538 

0.84079 

0.025493 

0.35742 

0.59875 

OLS 

0.5 

0.27857 

0.28049 

0.28786 

0.28095 

0.2819 

0.28394 

GMM-1 

0.011779 

0.3225 

0.67531 

-0.00211 

0.11246 

0.15008 

GMM-20 

0.15784 

0.16854 

0.19507 

0.1009 

0,10759 

0.12591 

KGMM-20 

0.13523 

0.16429 

0.20386 

0.065555 

0.09014 

0.11271 

GMM-Opt 

0.058289 

0.253 

0.46363 

0.01898 

0.10532 

0.14026 

BGMM-Opt 

-0.01254 

0.21014 

0.33764 

-0.02236 

0.095026 

0.12984 

KGMM-Opt 

0.065003 

0.22165 

0.34951 

0.012326 

0.10667 

0.13995 

BKGMM-Opt 

0.003854 

0.22473 

0.42831 

-0.01114 

0.094756 

0.13116 
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Table  4:  Performance  of  GMM  Estimators  with  0  =  .5  and  Heteroskedasticity 


n  = 

128 

n  = 

512 

Estimator 

-^ 

Bias 

IDR 

Bias 

IDR 

OLS 

0.1 

0.36203 

0.39565 

0.37789 

0.22017 

GMM-1 

0.50136 

5.22 

0.51375 

8.5451 

GMM-20 

0.36409 

0.9079 

0.43471 

1.163 

KGMM-20 

0.36288 

0.99398 

0.43279 

1.2035 

GMM-Opt 

0.37306 

1.8038 

0.44232 

2.2475 

BGMM-Opt 

0.35293 

1.0481 

0.42576 

1.1657 

KGMM-Opt 

0.41278 

2.0442 

0.49196 

2.7351 

BKGMM-Opt 

0.38757 

1.931 

0.48469 

2.158 

OLS 

0.3 

0.28962 

0.29852 

0.31254 

0.19993 

GMM-1 

0.19565 

2.991 

0.039369 

2.5622 

GMM-20 

0.24842 

0.65978 

0.28085 

0.9905 

KGMM-20 

0.248 

0.78716 

0.26538 

0.93733 

GMM-Opt 

0.22376 

1.2064 

0.22999 

1.1617 

BGMM-Opt 

0.24336 

0.73583 

0.27032 

0.97506 

KGMM-Opt 

0.2297 

1.4793 

0.24377 

1.62 

BKGMM-Opt 

0.23773 

1.37 

0.25496 

1.5437 

OLS 

0.5 

0.21107 

0.26352 

0.22779 

0.15279 

GMM-1 

-0.00746 

1.0207 

-0.00346 

0.60612 

GMM-20 

0.12121 

0.50708 

0.08675 

0.5928 

KGMM-20 

0.11991 

0.54298 

0.076 

0.47081 

GMM-Opt 

0.082524 

0.57197 

0.072163 

0.4773 

BGMM-Opt 

0.1116 

0.49428 

0.084331 

0.57169 

KGMM-Opt 

0.077325 

0.67164 

0.057138 

0.46211 

BKGMM-Opt 

0.086783 

0.65593 

0.06762 

0.50118 
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